TENSOR RANK AND THE ILL-POSEDNESS OF THE BEST 
LOW-RANK APPROXIMATION PROBLEM 
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Abstract. There has been continued interest in seeking a theorem describing optimal low-rank 
approximations to tensors of order 3 or higher, that parallels the Eckart- Young theorem for matrices. 
In this paper, we argue that the naive approach to this problem is doomed to failure because, unlike 
matrices, tensors of order 3 or higher can fail to have best rank-r approximations. The phenomenon 
is much more widespread than one might suspect: examples of this failure can be constructed over 
a wide range of dimensions, orders and ranks, regardless of the choice of norm (or even Bregman 
divergence). Moreover, we show that in many instances these counterexamples have positive volume: 
they cannot be regarded as isolated phenomena. In one extreme case, we exhibit a tensor space 
in which no rank-3 tensor has an optimal rank-2 approximation. The notable exceptions to this 
misbehavior are rank-1 tensors and order-2 tensors (i.e. matrices). 

In a more positive spirit, we propose a natural way of overcoming the ill-posedness of the low-rank 
approximation problem, by using weak solutions when true solutions do not exist. For this to work, 
it is necessary to characterize the set of weak solutions, and we do this in the case of rank 2, order 3 
(in arbitrary dimensions). In our work we emphasize the importance of closely studying concrete 
low-dimensional examples as a first step towards more general results. To this end, we present a 
detailed analysis of equivalence classes of 2 X 2 X 2 tensors, and we develop methods for extending 
results upwards to higher orders and dimensions. 

Finally, we link our work to existing studies of tensors from an algebraic geometric point of view. 
The rank of a tensor can in theory be given a semialgebraic description; in other words, can be 
determined by a system of polynomial inequalities. We study some of these polynomials in cases of 
interest to us; in particular we make extensive use of the hyperdeterminant A on R 2x2><2 . 
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1. Introduction. Given an order-fc tensor A S f^i*--*^ one is often required 
to find a best rank-r approximation to A — in other words, determine vectors Xj G 
M' !l ,y,eR £!2 ,...,z 1 el 4 ,j = l,...,r, that minimizes 

— xi (giyi ® • • • ® zi x r ® y r <g> ■ • • ® zJI 



or, in short, 



argmin rankg)(B) < r ||A - B\\. (approx(A, r)) 



Here ||-|| denotes some choice of norm on M* x '" x<it . When k = 2, the problem is 
completely resolved for unitarily invariant norms on R mxn with the Eckart-Young 
theorem 128] . which states that if 



^ — ^rank(A) 



is the singular value decomposition of A S R mxri . then a best rank-r approximation 
is given by the first r terms in the above sum |33j . The best rank-r approximation 
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problem for higher order tensors is a problem of central importance in the statistical 
analysis of multiway data [IIl[M[13[lIlll3[S3[M[ni[S3[S3[71[751[75]. 

It is therefore not surprising that there has been continued interest in finding a 
satisfactory 'singular value decomposition' and an 'Eckart- Young theorem'-like result 
for tensors of higher order. The view expressed in the conclusion of |46j is represen- 
tative of such efforts and we reproduce it here: 

11 An Eckart-Young type of best rank-r approximation theorem for 
tensors continues to elude our investigations but can perhaps eventu- 
ally be attained by using a different norm or yet other definitions of 
orthogonality and rank." 
It will perhaps come as a surprise to the reader that the problem of finding an 
'Eckart-Young type theorem' is ill-founded because of a more fundamental difficulty: 
the best rank-r approximation problem APPROx(A, r) has no solution in general! This 
paper seeks to provide an answer to this and several related questions. 

1.1. Summary. Since this is a long paper, we present an 'executive summary' of 
selected results, in this section and the next. We begin with the five main objectives 
of this article: 

1. APPROx(A,r) is ill-posed for many r. We will show that, regardless 
of the choice of norm, the problem of determining a best rank-r approx- 
imation for an order-A: tensor in ^ix---xd k n0 som tj on m general for 
r = 2, . . . , xmn{di, . . . , dk} and k > 3. In other words, the best low rank 
approximation problem for tensors is ill-posed for all orders (higher than 2), 
all norms, and many ranks. 

2. APPROx(A, r) is ill-posed for many A. We will show that the set of tensors 
that fail to have a best low rank approximation has positive volume. In other 
words, such failures are not rare — if one randomly picks a tensor A in a 
suitable tensor space, then there is a non-zero probability that A will fail to 
have a best rank-r approximation for some r < rank® (.A). 

3. Weak solutions to APPROx(A,r). We will propose a natural way to over- 
come the ill-posedness of the best rank-r approximation problem with the 
introduction of 'weak solutions', which we explicitly characterize in the case 
r = 2, k = 3. 

4. Semialgebraic description of tensor rank. From the Tarski-Seidcnberg 
theorem in model theory [7IJ El] we will deduce the following: for any 
d\, . . . , dk, there exists a finite number of polynomial functions, Ai, . . . , A m , 
defined on R d ix -x d fc sucn that the rank of any A e M d i x - xd fc is completely 
determined by the signs of Ai (A), . . . , A m (A). We work this out in the special 
case M 2x2x2 . 

5. Reduction. We will give techniques for reducing certain questions about 
tensors (orbits, invariants, limits) from high-dimensional tensor spaces to 
lower-dimensional tensor spaces. For instance, if two tensors in R c i x "' Xc <« lie 

in distinct GL Cli ... !C) . (R)-orbits, then they lie in distinct GLd 1 d k (R)-orbits 

in K dlX -- xdfc for any d t > a. 

The first objective is formally stated and proved in Theorem l4.10l The two notable 
exceptions where APPROx(A, r) has a solution are the cases r = 1 (approximation by 
rank-1 tensors) and k = 2 (A is a matrix). The standard way to prove these assertions 
is to use brute force: show that the sets where the approximators are to be found may 
be defined by polynomial equations. We will provide alternative elementary proofs of 
these results in Propositions 14.21 and 14.31 (see also Proposition I4.4|) . 
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The second objective is proved in Theorcm l8.41 which holds true on ^d 1 xd 2 xd 3 f or 
arbitrary G?!,G? 2 ,(f 3 > 2. Stronger results can hold in specific cases: in Theorem 18. 11 
we will give an instance where every rank-r tensor fails to have a best rank-(r — 1) 
approximator. 

The third objective is primarily possible because of the following theorem, which 
asserts that the boundary of the set of rank-2 tensors can be explicitly parameterized. 
The proof, and a discussion of weak solutions, is given in Section [5j 

Theorem 1.1. Let di,d 2 ,d 3 > 2. Let A n 6 E. dlXd2Xd3 be a sequence of tensors 
with rank^(j4„) < 2 and 



lim A n = A, 

n — >oo 



where the limit is taken in any norm topology. If the limiting tensor A has rank higher 
than 2, then rank® (^4) must be exactly 3 and there exist pairs of linearly independent 
vectors x 1 ,y 1 <E M. dl , x 2 ,y 2 G K d2 , x 3 ,y 3 <E R ds such that 

A = xi ® x 2 <8> y 3 + xi (g> y 2 ® x 3 + yi ® x 2 ® x 3 . (1.1) 

Furthermore, the above result is not vacuous since 

A n = n ^xi + ^yi^ ® + -^f^j ® ^ x 3 + - nxi ® x 2 (g) x 3 

is an example of a sequence that converges to A. 

A few conclusions can immediately be drawn from Theorem ll.il (i) the boundary 
points of all order-3 rank-2 tensors can be completely parameterized by (|l.l[k (ii) a 
sequence of order-3 rank-2 tensors cannot 'jump rank' by more than 1; (iii) A in p. ID . 
in particular, is an example of a tensor that has no best rank-2 approximation. 

The formal statements and proofs of the fourth objective appear in Section \§\ 
The fifth objective is exemplified by our approach throughout the paper; some specific 
technical tools are discussed in Sections 15.11 and 17.51 

On top of these five objectives, we pick up the following smaller results along the 
way. Some of these results address frequently asked questions in tensor approximation. 
They are discussed in Sections I4.3H4.7I respectively. 

6. Divergence of coefficients. Whenever a low-rank sequence of tensors con- 
verges to a higher-rank tensor, some of the terms in the sequence must blow 
up. In examples of minimal rank, all the terms blow up. 

7. Maximum rank. For k > 3, the maximum rank of an order- fc tensor in 
jgdix • xdfc ( wnere di > 2) always exceeds min(di, . . . , dk)- In contrast, for 
matrices min(di,d 2 ) docs bound the rank. 

8. Tensor rank can leap large gaps. Conclusion (ii) in the paragraph above 
does not generalize to rank r > 2. We will show that a sequence of fixed rank 
tensors can converge to a limiting tensor of arbitrarily higher rank. 

9. Bregman divergences do not help. If we replace norm by any con- 
tinuous measure of 'nearness' (including non-metric measures like Bregman 
divergences), it does not change the ill-foundcdness of APPROx(A, r). 

10. Leibniz tensors. We will construct a rich family of sequences of tensors 
with degenerate limits, labeled by partial derivative operators. The special 
case i 3 (l) is in fact our principal example (jl.ip throughout this paper. 
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1.2. Relation to prior work. The existence of tensors that can fail to have 
a best rank-r approximation is known to algebraic geometers as early as the 19th 
century, albeit in a different language — the locus of rth secant planes to a Segre 
variety may not define a (closed) algebraic variety. It is also known to computa- 
tional complexity theorists as the phenomenon underlying the concept of border rank 
[5j El Q21 S3 [54] and is related to (but different from) what chemometricians and 
psychometricians call 'candecomp/parafac degeneracy' [49J ETJ EZl EZl [68] • We do 
not claim to be the first to have found such an example — the honor belongs to Bini. 
Capovani, Lotti, and Romani, who gave an explicit example of a sequence of rank-5 
tensors converging to a rank-6 tensor in 1979 [7]. The novelty of Theorem 11.11 is not 
in demonstrating that a tensor may be approximated arbitrarily well by tensors of 
strictly lower rank but in characterizing all such tensors in the order-3 rank-2 case. 

Having said this, we would like to point out that the ill-poscdness of the best rank- 
r approximation problem for high-order tensors is not at all well-known, as is evident 
from the paragraph quoted earlier as well as other discussions in recent publications 
[44] [45] |46"1 [47] [80] . One likely reason is that in algebraic geometry, computational 
complexity, chemometrics, and psychometrics, the problem is neither stated in the 
form nor viewed in the light of obtaining a best low-rank approximation with re- 
spect to a choice of norm (we give several equivalent formulations of APPROx(A,r) 
in Proposition 14. 1[) . As such, one goal of this paper will be to debunk, once and for 
all, the question of finding best low-rank approximations for tensors of order 3 or 
higher. As we stated earlier (as our first and second objectives), our contribution will 
be to show that such failures (i) can and will occur for tensors of any order higher 
than 2, (ii) that they will occur for tensors of many different ranks, (iii) that they 
will occur regardless of the choice of norm, and (iv) that they will occur with non- 
zero probability. Formally, we have the following two theorems (which will appear as 
Theorems 14.101 and 18.41 subsequently) : 

Theorem 1.2. Let k > 3 and d\,...,d k > 2. For any s such that 2 < s < 
min{di, . . . ,dk}, there exists A £ R dlX ' " xdk with rank® (.A) = s such that A has no 
best rank-r approximation for some r < s. The result is independent of the choice of 
norms. 

Theorem 1.3. Ifdi,d 2 ,d 3 > 2, then the set 

{A £ jg^ix^ xd 3 | ^ oef , nQ £ j lave a fo es t ran j i _2 approximation} 

has positive volume; indeed, it contains a nonempty open set. 

A few features distinguish our work in this paper from existing studies in algebraic 
geometry [T21 [T21 [Ml (Ml EI] and algebraic computational complexity [H [21 G3 El 
[H H21 EH] : (i) we are interested in tensors over K as opposed to tensors over C (it is 
well-known that the rank of a tensor is dependent on the underlying field, cf. (|7.5|) and 
[1]); (ii) our interest is not limited to order-3 tensors (as is often the case in algebraic 
computational complexity) — we would like to prove results that hold for tensors of 
any order k > 3; (iii) since we are interested in questions pertaining to approximations 
in the norm, the Euclidean (norm-induced) topology will be more relevant than the 
Zariski topolog}0 on the tensor product spaces — note in particular that the claim 
that a set is not closed in the Euclidean topology is a stronger statement than the 
corresponding claim in Zariski topology. 

1 Note that the Zariski topology on k n is denned for any field k (not just algebraically closed 
ones). It is the weakest topology such that all polynomial functions arc continuous. In particularly, 
the closed sets are precisely the zero sets of collections of polynomials. 
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Our work in this paper in general, and in Section 14.21 in particular, is related to 
studies of 'candecomp/parafac degeneracy' or 'diverging CANDECOMP/PARAFAC 
components' in psychomctrics and chemometrics |49[ [5TI R)2l IBTI I68j . Diverging coef- 
ficients are a necessary consequence of the ill-posedness of APPROx(A, r) (see Propo- 
sitions [48] and [4J)]). In fact, examples of 'fc-factor divergence' abound, for arbitrary fc 
— see Sections 14.41 and 14.71 for various constructions. 

Section [5.41 discusses how the non-existence of a best rank-r approximation poses 
serious difficulties for multilinear statistical models based on such approximations. In 
particular, we will see: (i) why it is meaningless to ask for a 'good' rank-r approxima- 
tion when a best rank-r approximation does not exist; (ii) why even a small perturba- 
tion to a rank-r tensor can result in a tensor that has no best rank-r approximation; 
(iii) why the computational feasibility of finding a 'good' rank-r approximation is 
questionable. 

1.3. Outline of the paper. Section [2] introduces the basic algebra of tensors 
and fc-way arrays. Section [3J defines tensor rank and gives some of its known (and 
unknown) algebraic properties. Section [4] studies the topological properties of tensor 
rank and the phenomenon of rank-jumping. Section [5] characterizes the problematic 
tensors in R 2x2x2 , and discusses the implications for approximation problems. Sec- 
tion [5] gives a short exposition of the semialgebraic point of view. Section [7J classifies 
tensors in R 2x2x2 by orbit type. The orbit structure of tensor spaces is studied from 
several different aspects. Section[5Jis devoted to the result that failure of APPROx(A, 2) 
occurs on a set of positive volume. 

2. Tensors. Even though tensors are well-studied objects in the standard grad- 
uate mathematics curriculum [TJ [571 SB HH H3] an d more specifically in multilinear 
algebra SH HH [U [75] , a 'tensor' continues to be viewed as a mysterious object by 
outsiders. We feel that we should say a few words to demystify the term. 

In mathematics, the question 'what is a vector?' has the simple answer 'a vector 
is an element of a vector space' — in other words, a vector is characterized by the 
axioms that define the algebraic operations on a vector space. In physics, however, 
the question 'what is a vector?' often means 'what kinds of physical quantities can 
be represented by vectors?' The criterion has to do with the change of basis theorem: 
an n-dimensional vector is an 'object' that is represented by n real numbers once a 
basis is chosen only if those real numbers transform themselves as expected when one 
changes the basis. For exactly the same reason, the meaning of a tensor is obscured 
by its more restrictive use in physics. In physics (and also engineering), a tensor is an 
'object' represented by a fc-way array of real numbers that transforms according to 
certain rules (cf. (|2.2p ) under a change of basis. In mathematics, these 'transformation 
rules' are simply consequences of the multilincarity of the tensor product and the 
change of basis theorem for vectors. Nowadays, books written primarily for a physics 
audience [32|I60| have increasingly adopted the mathematical definition, but a handful 
of recently published books continue to propagate the obsolete (and vague) definition. 
To add to the confusion, 'tensor' is frequently used to refer to a tensor field (e.g. metric 
tensor, stress tensor, Riemann curvature tensor). 

For our purposes, an order-fc tensor A is simply an element of a tensor product of 
k real vector spaces, V\ <8> V2 <S> ■ ■ • <8> Vk, as defined in any standard algebra textbook 
[Il[9l[27J[Ml[ffl[52l[5g[6Tl[Ml[78j. Uptoa choice of bases onV u ...,V k , such an 
element may be coordinatized, i.e. represented as a fc-way array A of real numbers — 
much as an element of an n-dimcnsional vector space may, up to a choice of basis, 
be represented by an n-tuple of numbers in R™. We will let R dlX '" xdk denote the 
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vector space of fc-way arrays of real numbers A — [o^... j k Jjl=i' k j k =\ with addition 
and scalar multiplication denned coordinatewise: 

[oji-iJ + I^i-jJ : = hi-i,, + h h-jul and -%ji-iJ : = [Afflii-iJ- (2-1) 

A fc-way arrays of numbers (or fc-array) is also sometimes referred to as a fc-dimensional 
hypermatrix |30j . 

It may be helpful to think of a fc-array as a data structure, convenient for repre- 
senting or storing the coefficients of a tensor with respect to a set of bases. The tensor 
itself carries with it an algebraic structure, by virtue of being an clement of a tensor 
product of vector spaces. Once bases have been chosen for these vector spaces, we 
may view the order-fc tensor as a fc-way array equipped with the algebraic operations 
defined in (|2.1[) and (|2.3|) . Despite this correspondence, it is not wise to regard 'tensor' 
as being synonymous with 'array'. 

Notation. We will denote elements of abstract tensor spaces in boldface upper- 
case letters; whereas fc-arrays will be denoted in italic upper-case letters. Thus A 
is an abstract tensor, which may be represented by an array of numbers A with 
respect to a basis. We will use double brackets to enclose the entries of a fc-array — 
A = [fflji... jkljl=i' k j k =i — an d when there is no risk of confusion, we will leave out 
the range of the indices and simply write A = [flUi .••.?'* J • 

2.1. Multilinear matrix multiplication. Matrices can act on other matri- 
ces through two independent multiplication operations: left-multiplication and right- 
multiplication. Matrices act on order-3 tensors via three different multiplication op- 
erations. These can be combined into a single formula. If A = [flyfe] G ^d, 1 xd 2 xd a 
and L = [X pi ] G K ClXdl , M = [n qj ] G R C2Xd2 , N = [v rk ] G IT 3 **, then the array A 
may be transformed into an array A 1 = [a' r ] G K Cl x C2 x C3 , by the equation: 

, • s -^,di,d2,d 3 
fl pgr =>,.., , KiHq3 v rk(lijk (2.2) 

We call this operation the multilinear multiplication of A by matrices L, M, N, which 
we write succinctly as 

A' = (L, M, N) ■ A. 

Informally, we are multiplying the 3-way array A on its three 'sides' by the matrices 
L, M, N respectively. 

Remark. This notation is standard in mathematics — the elements of a product 
G\ x G2 x G3 are generally grouped in the form [L, M, N), and when a set with some 
algebraic structure G acts on another set X, the result of g G G acting on x G X 
is almost universally written g ■ x [TJ [HI [571 S3 EH [S3]- Here we are just looking at 
the case G = W- lXdl x R c ^d 2 x K c 3 xd 3 alm x = R<i 1 xd 2 xd 3 _ Thig ig consistent wit h 

notation adopted in earlier work [52] but more recent publications such as HI] 
have used Ax 1 L T x 2 M T x 3 N T in place of (L,M,N) ■ A. 

Multilinear matrix multiplication extends in a straightforward way to arrays of 
arbitrary order: if A = \a tl ... lk \ G R d i*'" x ^ and L x = [A^] G W 1 xdl , . . . , L k = 

given by 



[X\y] G R c « xd k, then A' = (L u L k ) ■ A is the array A 1 



nci X ■■■Xc fc 



E 



d 1 ,...,d k 
ii,...,ik = 



^iiji ' ' ' ^ikjk a ji - jk ■ (2-3) 
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We will now see how a 3-way array representing a tensor in V\ <8> V2<8> V3 transforms 
under changes of bases of the vector spaces Vi . Suppose the 3-way array A = [a^] G 
T^dixd 2 xd 3 re p resen ts an order-3 tensor A G V\ ® V2 <8> V3 with respect to bases B\ = 
{e; I i = 1, . . . , di}, B 2 = {f/ I j = 1, . . . , d 2 }, S3 = {gfe I k = 1, . . . , d 3 } on Vi, V 2 , V 3 , 
i.e. 



A = 



Edi,d 2 ,d 3 
CLijkBi <g> tj (g> gfc. (2.4) 
z, 7. A;— 1 



Suppose we choose different bases, £>i = {e- | i = 1, . . . , di}, B' 2 = {fj | j = 1, . . . , ^2}, 
& 3 = {g' k \ k = l,...,d 3 } on V u V 2 , V 3 where 

and X = [A pi ] £ R dlXdl , M = \jj, qj ) G M. d2Xd2 , N = [v rk ] G R d ^ xd ^ are the respective 
changc-of-basis matrices. Substituting the expressions for (|2.5p into (|2.4j) . we get 

r — y di .do ,ds 
A = Z^ p , q , r =l ® f ? ® ^ 

where 



, y ai,a 2 ,a 3 
%qr = / \iVqj v rk<lijk , (2.6) 



DcZi X(^2 XC?3 



or more simply A' = (L, M, N) ■ A, where the 3-way array A' = la' \ 
represents A with respect to this new choice of bases B[,B'2,B' 3 . 

All of this extends immediately to ordcr-fc tensors and fc-way arrays. Henceforth, 
when a choice of basis is implicit, we will not distinguish between an order-fc tensor 
and the fc-way array that represents it. 

The change-of-basis matrices L,M,N in the discussion above are of course in- 
vertible; in other words they belong their respective general linear groups. We 
write GLd(R) for the group of nonsingular matrices in l$L dxd . Thus L G GL ( j 1 (R), 
M G GLd 2 (R.), N G GLd 3 (M). In addition to general linear transformations, it is 
natural to consider orthogonal transformations. We write 0<j,(R) for the subgroup of 
GLd(R) of transformations which preserve the Euclidean inner product. The following 
shorthand is helpful: 

GL dl ,... A (R) := GL dl (R) x • • ■ x GL dk (R) 
O dl ,„., dk (R) :=O dl (R) x •■• xO dfc (M) 

Then O dli ..., d)! (R) < GL dll ... idfe (R), and both groups act on R^ix-x* via multilinear 
multiplication. 

Definition 2.1. Two tensors A, A' G R dlX '" xdk are said to be GL- equivalent 
(or simply 'equivalent') if there exists (Li,...,L k ) G GL^ <j fc (R) such that A' = 
(L\, . . . , L k ) ■ A. More strongly, we say that A, A' are O-equivalent if such a transfor- 
mation L can be found in O c i 1 d fe Q&)- 

For example, if Vi,...,V k are vector spaces and dim(V^) = di, then A, A' G 
jj>di x ■ ■ ■ x d fc represent the same tensor in V\ ® ■ ■ ■ ® V k with respect to two different 
bases iff A, A' are GL-equivalent. 

We finish with some trivial properties of multilinear matrix multiplication: for 
A,B G R^x-x^, and e Rj 

(L u ...,L k )- (aA + (3B) = a(L u . . . , L k ) ■ A + p(L u . ..,L k )-B (2.7) 
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and for L, e R c * xd > , Mi € R b * XCi , i = 1, . . . , k, 

(Mi, . . . , M fc ) • [(Li, ■ • ■ , L k ) ■ A] = (Miii, . . . , MfeXfe) ■ A. (2.8) 

Lastly, the name multilinear matrix multiplication is justified since for any Mj, Ni G 

W' xd -, a,/3eR, 

(Li, ...,oMi + /SJVi, . . . , L fc ) • A = a(Li, . . . , M, . . . , L fc ) • A (2.9) 

+ /3(Lx, . . . , Ni, . . . , L fc ) • A. 

2.2. Outer-product rank and outer-product decomposition of a tensor. 

Let R dl <E> ■ ■ ■ <E> R dk be the tensor product of the vector spaces R dl , . . . , R dfc . Note 
that the Segre map 

K dl x ••• xR d " _>R«*ix-x*. j (xi,...,Xfc) i — * M l) ---x? ) 1'f l '"" d *_ 1 (2.10) 

is multilinear and so by the universal property of tensor product [TJ [HI H7J [3U Ell 
[521 [591 [CT1 1531 [75] , we have a unique linear map such that the following diagram 
commutes: 

K™ 1 <g> • • • <g> R" fc 

R ni x ■•■ xi"' — ► r iX '" xn * 

Clearly, 

v(xi »• -sx*) = [*£>... ^Ij;^ (2.ii) 

and is a vector space isomorphism since dim(R rflX ' " xdfc ) = dim(R dl ® • • • ® R dfc ) = 
d\ ■ ■ ■ dfc. Henceforth we will not distinguish between these two spaces. The elements 
of R dl <g> • • • <g> R dk = R d i x - xd " will be called a tensor and we will also drop ip in 
PTTTj) and write 

xi ® - -®x, = [««..- agfctr (2-12) 

Note that the symbol <g> in (|2.1ip denotes the formal tensor product and by dropping 
ip, we are using the same symbol (£> to define the outer product of the vectors Xi , . . . , Xfe 
via the formula (|2.12[) . Hence, a tensor can be represented either as a fc-dimensional 
array or as a sum of formal tensor products of k vectors — where the equivalence be- 
tween these two objects is established by taking the formal tensor product of k vectors 
as defining a fc-way array via (|2.12|) . 

It is clear that the map in (|2.10p is not surjective — the image consists precisely 
of the decomposable tensors: a tensor A S R d i x " x<i '= is said to be decomposable if it 
can be written in the form 

A = Xi ® • • • ® Xfe 

with x, 6l di for i = 1, . . . , k. It is easy to see that multilinear matrix multiplication 
of decomposable tensors obeys the formula: 



(L 1: . . . ,L k ) ■ (xi <g> • • ■ ig> x fe ) = Lixi ® • • • <g> L k x k . 



(2.13) 
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Remark. The outer product can be viewed as a special case of multilinear matrix 
multiplication. For example, a linear combination of outer products of vectors may 
be expressed in terms of multilinear matrix multiplication: 

V r AiX, <8> y 4 ® z, = (X, Y, Z) ■ A 

*%= 1 

with matrices X = [x x , ...,x r ] G R lxr , F = [y 1; ...,y r ] £ M mxr , Z = [z X) ...,z r ] G 
K ,ixr and a 'diagonal tensor' A = diag[Ai, . . . , A r ] G R rxrxr . 
We now come to the main concept of interest in this paper. 

Definition 2.2. A tensor has outer-product rank r if it can be written as a 
sum of r decomposable tensors, but no fewer. We will write rank® (A) for the outer- 
product rank of A. So 

rank^(A) := min{r | A = Y^i=i u i ® v « ® ' ' ' ® z «}- 

Note that a non-zero decomposable tensor has outer-product rank 1. 

Despite several claims of originality as well as many misplaced attributions to 
these claims, the concepts of tensor rank and the decomposition of a tensor into a sum 
of outer-products of vectors was the product of much earlier work by Frank L. Hitch- 
cock in 1927 [SniHO]- We call this the outer-product rank mainly to distinguish it from 
the multilinear rank to be defined in Section 12.51 (due to Hitchcock) but we will use 
the term rank or tensor rank most of the time when there is no danger of confusion. 

Lemma 2.3 (Invariance of tensor rank). (1) If A eR dlX '" xdk and (L\, . . . , L k ) G 

Rcixdi x ... x Rc fc xd fcj then 

rankjg,((Li, . ..,Lk) • A) < rank® (A). (2.14) 

(2) If A e R^ix-x* and (L lt ...,L k ) G GL dl ,... A (R) := GL dl (R) x ••• x GL dk (R), 
then 

rank ((Li, . . . , L k ) ■ A) = rank 8 (A). (2.15) 



Proof. (|2~14l) follows from (|2~13)l and lp7T|) . Indeed, if A = £j =1 x{ ® • • • g) x£ 

then (Li, . . . , L k ) ■ A = Y?j=i -^i x i ® • • • ® £fc x i- Furthermore, if the Li are invertible 
then by (f2ljj) we get 

A=(L^ 1 ,...,Lf)-[(L 1 ,...,L k ).A] 
and so rank® (A) < rankg,((Li, . . . , L k ) • A), hence f)2.15[) . □ 

2.3. The outer product and direct sum operations on tensors. The outer 
product of vectors defined earlier is a special case of the outer product of two tensors. 
Let A G R dlX '" xdk be a tensor of order k and B G K c i x - Xc * be a tensor of order £, 
then the outer product of A and B is the tensor C := A ® B G ffix-^txcix-xc, of 
order fc + £ defined by 

c «l ■ ■ ■ ikjl ■ ■ -31 = a h —ih bji ■ ■ -jl ■ 

The direct sum of two order-fc tensors A G K d i x - X * an d 5 g l ClX " ,xc * is the 
order-fc tensor C := A (B B <E R(^+ d ^ x --- x ( Ck + dk ) defined by 

<h u ...,i h if 1 < *q < d a , a — 1, . . . , k; 

\ hi-di,...,i k -d k if d a + 1 < i a < c a + d a , a = 1, . . . , k; 
otherwise. 
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For matrices, the direct sum of A G R m i x ™i and B G l" 12 *'" 2 is simply the block- 
diagonal matrix 



A®B 



A 
B 



i(mi+m 2 ) X (ni+n 2 ) 



The direct sum of two order-3 tensors A G R ,lXmiXni and B G R l ^ m ^ n ^ is a 'block 
tensor' with A in the (1, 1, l)-block and B in the (2, 2, 2)-block 



A(B B = 



' A 








" 











B 



i(li+l2)x(mi+m2)x («l+n2) 



In abstract terms, if Ui,Vi,Wi are vector spaces such that W, ; = Ui ffi Vi for 
i = 1, . . . , k, then tensors A G U\ (g> • ■ ■ ® Uk and B G V\ <E> • • ■ ® VJ. have direct sum 
A© 5 G Wi <g> • • • ® W k - 

2.4. Tensor subspaces. Whenever c < d there is a canonical embedding R c C 
R d given by identifying the c coordinates of R c with the first c coordinates of M. d . 

Let Ci < di for i = 1 , . . . , k. Then there is then a canonical embedding R Cl x ' " x Ck C 
K dlX ' " xrffc , defined as the tensor product of the embeddings R Ci C R di . We say that 
l ClX '" XCfe is a tensor subspace of R dlX "' xdk . More generally, if Ui, Vi are vector spaces 
with Ui C Vi for i = 1, . . . , fc, then there is an inclusion U\ ® ■ ■ ■ ® Uk C V\ ® • • • ® Vk 
defined as the tensor product of the inclusions Ui C Vi. Again we say that U\® - ■ -®Uk 
is a tensor subspace of V\ ® • • • ® T4. 

If i? G M ClX "' Xc,! then its image under the canonical embedding into ffi dlX — dk can 
be written in the form B © 0, where G ]R( d i- c i) x - >< ( d fc- c '=) i s the zero tensor. A 
tensor A G R dlX ' " x<ifc is said to be GL- equivalent (or simply 'equivalent') to B if there 
exists (Li,..,,Lk) G GL < j li ... j( j fe (M) such that UffiO = (Li, . . . , L&) ■ A More strongly, 
we say that A is O-equivalent ('orthogonally equivalent') to B if such a transformation 
can be found in O^...^ (R). 

We note that A is GL-equivalent to B if and only if there exist full-rank matrices 
Mi G R d ' XCi such that A = (Mi, • • • , M k ) ■ B. In one direction, Mi can be obtained 
as the first Ci columns of L~ x . In the other direction, L~ x can be obtained from Mi 
by adjoining extra columns. There is a similar statement for O-equivalcncc. Instead 
of full rank, the condition is that the matrices Mi have orthogonal columns. 

An important simplifying principle in tensor algebra is that questions about a 
tensor — such as 'what is its rank?' — can sometimes, as we shall see, be reduced 
to analogous questions about an equivalent tensor in a lower-dimensional tensor sub- 
space. 

2.5. Multilinear rank and multilinear decomposition of a tensor. Al- 
though we focus on outer product rank in this paper, there is a simpler notion of 
multilinear rank which directly generalizes the column and row ranks of a matrix to 
higher order tensors. 

For convenience, we will consider order-3 tensors only. Let A — la^k} G R^x^x*, 
For fixed values of j G {1, . . . , g^} and k G {1, . . . , c^}, consider the vector A,jk := 
[ a ijk\i=i G R dl . Likewise consider (column) vectors Ai,k := [dijkljLi € f° r fixed 
values of i,k, and (row) vectors Aij, := [aijk]k=i 6 f° r fixed values of In 
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analogy with row rank and column rank, define 

n(A) := dim(span ffi {A.jfc | 1 < j < d 2 , 1 < k < d 3 }), 
r 2 (A) := dim(span R {Ai. fc |l<i<di,l<fc< d 3 }), 
r 3 (A) := dim(span R {A ij . | 1 < i < di, 1 < j < d 2 }). 

For another interpretation, note that M d i xd 2xd 3 can i-, e v j ewec j as ^d 1 xd 2 d 3 by 
ignoring the multiplicative structure between the second and third factors. Then r\ (A) 
is simply the rank of A regarded as d\ x d 2 d 3 matrix. There are similar definitions 
for r 2 {A) and r 3 {A). 

The multilinear rank of A, denotecdrankg(A), is the 3-tuple {r%{A), r 2 (A), r 3 (A)). 
Again, this concept is not new but was first explored by Hitchcock under the name 
multiplex rank in the same papers where he defined tensor rank [391 140] . What we 
term multilinear rank will be equivalent to Hitchcock's duplex rank. A point to note 
is that ri(A), r 2 (A), r 3 (A), and rank® (A) are in general all different — a departure 
from the case of matrices, where the row rank, column rank and outer product rank 
are always equal. Observe that we will always have 

n(A) < min{rank (A),d 4 }. (2.16) 

Let us verify this for ry. if A = xi ® yi ® zi + • • • + x r ® y r <g) z r then each vector 
A m jk belongs to span(xi, . . . ,x r ). This implies that r\ < rank^(A), and r\ < d\ is 
immediate from the definitions. A simple but useful consequence of (|2 . 1 6[) is that 

rank 8 (A) > Hrani^A)^ = msx{n(A) i = 1, . . . , k}. (2.17) 

If A G ^d 1 xd 2 xd 3 then and rank^A) = (ri,r 2 ,r 3 ), then there exist subspaces 
Ui C W u with d\m(Ui) = n, such that A G U x ® U 2 ® U 3 . We call these the 
supporting subspaces of A. The supporting subspaces are minimal, in the sense that 
if A G V\ ® V 2 V 3 then Ui C Vi for i = 1,2, 3. This observation leads to an alternate 
definition: 

n(A) = min{dim([/ ? ) | Ui C R d \ U 2 C R d2 , U 3 C M d;i , A G U x ® U 2 8 U 3 }. 

An immediate consequence of this characterization is that rankg(A) is invari- 
ant under the action of GL dl ,d 2 ,d 3 (K): if A' = (L,M,N) ■ A, where (L,M,N) G 
GL dlid2id3 (R), then rank ffl (A) = rank H ((L, M, N) ■ A). Indeed, if Ui,U 2 ,U 3 are the 
supporting subspaces of A, then L(Ui), M(U 2 ), N(U 3 ) are the supporting subspaces 
of (L, M, N) ■ A. 

More generally, we have multilinear rank equivalents of (|2. 14[) and (|2.15[) : if 

AeW hx-xd k and (L u ...,L k ) GR clXdl x ••• xW kXdk , then 

rank ffl ((i 1 , . . . ,L k ) ■ A) < rank ffl (A), (2.18) 

and if A G R d ix-x^ and (L u ...,L k ) G GL rfl ,... A (R), then 

rank H ((Li, . . . , L k ) ■ A) = rank ffl (A). (2.19) 

Suppose rank^A) = (r\^r 2 ,r 3 ). By applying transformations Li G GL^R) 
which carry Ui to R r % it follows that A is equivalent to some B G W iy - r2Xr;i . Alter- 
natively there exist B G R r i x ^ xr 3 an d full-rank matrices L G R dlXri , M G R d2Xr2 , 
N G R d3Xr3 , such that 

A = (L, M, N) ■ B. 



2 The symbol EB is meant to evoke an impression of the rows and columns in a matrix. 
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The matrices L,M,N may be chosen to have orthonormal columns or to be unit 
lower-triangular — a fact easily deduced from applying the Qi?-dccomposition or the 
^^-decomposition to the full-rank matrices L,M,N and using (|2.8p . 

To a large extent, the study of tensors A G M d ' lXd2><d3 with rank^(A) < (7*1, r 2 , r 3 ) 
reduces to the study of tensors in l riXr2XI " 3 . This is a useful reduction, but (unlike 
the matrix case) it does not even come close to giving us a full classification of tensor 
types. 

2.6. Multilinear orthogonal projection. If U is a subspace of an inner- 
product space V (for instance, V — M. n with the usual dot product), then there 
is an orthogonal projection from V onto U, which we denote ttu- We regard this as a 
map V — > V. As such, it is self-adjoint (i.e. has a symmetric matrix with respect to 
any orthonormal basis), and satisfies irfj = ttu, im(7ny) = U, ker(iru) = U . We note 
Pythagoras' theorem for any v G V: 

l|v|| 2 = IKv|| 2 + ||(l-^)v|| 2 

We now consider orthogonal projections for tensor spaces. If Ui,U2, U 3 are sub- 
spaces of Vi, V2, V3, respectively, then Ui ig> Z7 2 eg) U 3 is a tensor subspace of V\ ®V2®V 3 , 
and the multilinear map II = (ttu 1 , ttjj 2 , nu a ) is a projection onto that subspace. In 
fact, II is orthogonal with respect to the Frobenius norm. The easiest way to see this 
is to identify U{ C Vi with R Ci C M. di by taking suitable orthonormal bases; then II 
acts by zeroing out all the entries of a d\ x d 2 x ^3 array outside the initial c\ x c 2 x C3 
block. In particular we have Pythagoras' theorem for any A G V\ ® V2 ® V 3 : 

\\A\\% = \\UA\\ 2 F + \\(1-U)A\\% (2.20) 

Being a multilinear map, II is non- increasing for rank® , rankfg , as in (|2.14[) . (|2.18p . 

There is a useful orthogonal projection 11^ associated with any tensor A G 
R dl xd2X da . Let U\ , U 2 , U 3 be the supporting subspaces of A, so that A G U\ <E> U 2 ® U 3 , 
and dim(Uj) = rt(A) for i — 1,2, 3. Define: 

U A = (Tr 1 (A),TT 2 (A),TT 3 (A)) = {l^JJ 1 , 7Tu 2 , 7T(7 g ) 

Proposition 2.4. Ua(A) = A. 

Proof. A belongs to Ui ® U2 <X> U 3 , which is fixed by II ,4 . □ 

Proposition 2.5. The function A IIa is continuous over subsets fR. dlXd2Xda 
on which rank^(A) is constant. 

Proof. We show, for example, that ~k\ = ni(A) depends continuously on A. For 
any A G R dl x d2 x ds , select r = ri(A) index pairs (j, fc) such that the vectors A,jk are 
linearly independent. For any B near A, assemble the marked vectors as a matrix 
X = X(B) G R d ' xr . Then tti = X(X T X)- 1 X T =: P(B) by a well-known formula in 
linear algebra. The function P(B) is defined and continuous as long as the r selected 
vectors remain independent, which is true on a neighborhood of A. Finally, the 
orthogonal projection defined by P(B) maps onto the span of the r selected vectors. 
Thus, if n(B) = r then P(B) = tti(B). □ 

It is clear that the results of this section apply to tensor spaces of all orders. 

3. The algebra of tensor rank. We will state and prove a few basic results 
about the outer-product rank. 

PROPOSITION 3.1. Let A G R c i x ->< c * c M. dlX '" xdk . The rank of A regarded as a 
tensor in R ClX ' " XCfc is the same as the rank of A regarded as a tensor in R lX "' x k . 
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Proof. For each i the identity on W Ci factors as a pair of maps K Ci R j -» R Ci , 
where tj is the canonical inclusion and 7r is the projection map given by deleting the 
last di — Ci coordinates. Applying (|2.14p twice, we have 

rank®(A) > rank (8 ((t 1 , . .., t k ) • A) > rank®((7r 1; ...,ir k )- . . . , i k ) ■ A) 

= rank®((7Tiii, . . . ,7r fc i fc ) • A) 
= rank® (A) 

so A G jjcix— xc»i anc j y. g j ma g e . . . s tfc) • A G M dlX ' " xdk must have equal tensor 
ranks. □ 

Corollary 3.2. Suppose A G R d i x -* d * tmd rank ffl (A) < (ci, . . . , Cfc). T/ien 
rank®(A) = rank®(i?) /or an equivalent tensor B G R clX "' XCfc . □ 

The next corollary asserts that tensor rank is consistent under a different scenario: 
when order k tensors are regarded as order I tensors, for I > k, by taking the tensor 
product with a non-zero monomial term. 

Corollary 3.3. Let A £ E. dlX '" xdk be an order-k tensor and u k+1 G R dfc +\ . .., 
Uk+i G be non-zero vectors. Then 

rank® (A) = rank® (A ® u fc+ i ® ■ ■ ■ ® u fc+ ^). 

Proof. Let c fc+ i = • • • = c fc+( = 1 and apply Proposition 13. II to A G R dlX "' xd * = 
R d!X-x<i fc xc h+1 x-xc fc+< ^ R d 1 x---xd fc xd fc+1 x--xd fc+f _ Note that the image f the in- 
clusion is A® e{ k+1) (g> • • • <g) ei fe+ * } where = (1,0,..., 0) T G E*. So we have 

rank® (A ® e^ 1 ' <8> • • • <g> + ^) = rank® (A). 

The general case for arbitrary non-zero U; G M di follows from applying to Aigie^ ® 
• • • <g) e[ k+e ^ a multilinear multiplication (J^ , . . . , Id k , £i, ■ ■ ■ , Li) G GL < j 1) ... j( i h+/ (R) 
where P; is the d x d identity matrix and P,; is a non-singular matrix with pe; = u^. 
It then follows from Lemma [2~3l that 

rank® (A ® u fc+i (g) • • • <g> u k+e ) 

= rank® [( J dl , . . . , I dk , L 1 , . . . , L t ) • ( A ® ei fe+1) ® • • • ® ei fe+£) )] 
= rank® (A ® e^ fc+1) <g> ■ ■ • g> ei fc+£) ). 

□ 

Let P = u fc+ i <g) u fe+2 (8) ■■• ® u fc+f G R d fc+ix-x d *»+«. So rank®(P) = 1 and 
Corollary 13.31 says that rank® ( A <g> P) — rank® (A) rank® (E) . Note that this last 
relation docs not generalize. If rank® (A) > 1 and rank®(P) > 1, then it is true that 

rank® (A ® B) < rank® (A) rank®(P), 

since one can multiply decompositions of A, B term by term to obtain a decomposition 
of A <g> B, but it can happen (cf. [T^]) that 

rank® (A <g) B) < rank® (A) rank® (B). 

The corresponding statement for direct sum is still an open problem for tensors 
of order 3 or higher. It has been conjectured by Strassen |69| that 

rank®(A ® B) = rank®(A) + rank®(P) (3.1) 
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for all order-fc tensors A and B. However JaJa and Takche [35] have shown that for 
the special case when A and B are of order 3 and at least one of them is a matrix 
pencil (i.e. a tensor of size p x qx 2, p x 2 x q, or 2 x p x g that may be regarded as 
a pair of p x q matrices), then the direct sum conjecture holds. 

Theorem 3.4 (JaJa-Takche 03J). Let A G R^ xc ^ x ^ and B G R^xfcxds, 7/ 
2 6 {ci, c 2 , c 3 , di, d 2 , rf 3 }, i/iera 

rank® (A © 7?) = rank® (A) + rank® (7?). 

□ 

It is not hard to define tensors of arbitrarily high rank so long as we have suffi- 
ciently many linearly independent vectors in every factor. 

Lemma 3.5. For £ = l,...,k, let xj , . . . , x^' G R di be linearly independent. 
Then the tensor defined by 

r 

A := V x$ 1J ® x^ 2) ® • • • ® x< fc) G R dl x d2 x ■ ■ ■ x dk 

Z j J J 3 

3 = 1 

has rank® (A) = r. 

Proof. Note that rank^(A) = (r,r,...,r). By (|2.17p . we get 

rank®(A) > maxjY^A) | i = 1, . . . , k} = r. 

On the other hand, it is clear that rank® (A) < r. □ 

Thus, in u<»ix— xd^ j s easy to write down tensors of any rank r in the range 
< r < minjiii, . . . , dfc}. For matrices, this exhausts all possibilities; the rank of 
A G R dlXd2 is at most rninjdi, cf 2 }. In contrast, for fc > 3, there will always be 
tensors in R d i x<1 fc that have rank exceeding min{<ii, . . . , <4}- We will see this in 
Theorem OHl 

4. The topology of tensor rank. Let A = [a^...^] G W llX - xdk . The Frobe- 
nius norm of A and its associated inner product are defined by 

* — 'ii,...,t fe =i * — 'ii,...,t fe =i 
Note that for a decomposable tensor, the Frobenius norm satisfies 

||u® V ® ■ ■ • ®z|| F = ||u|| 2 ||v||2 • • • ||z|| 2 (4.1) 

where ||-|| 2 denotes the Z 2 -norm of a vector, and more generally 

\\A®B\\ F = \\A\\ F \\B\\ F (4.2) 

for arbitrary tensors A, B. Another important property which follows from (|2.13[) and 
(|4.1[) is orthogonal invariance: 

\\(L u ... 1 L k )-A\\ F = \\A\\ F (4.3) 

whenever (L±, . . . , Lk) G d k (R)- There arc of course many other natural choices 

of norms on tensor product spaces |25l I36j . The important thing to note is that 
jjdix— xdfc DCm g nn itc dimensional, all these norms will induce the same topology. 
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We define the following (topological) subspaces of M. dlX "' xdk . 

S r (d u . . . ,d k ) = {A £ R d i*" xd <= | rank (A) < r} 
Sridx, . . . , d k ) = closure of S r (d u . . . , d k ) C R d ^ x - Xdk 

Clearly the only reason to define S r is the sad fact that S r is not necessarily (or even 
usually) closed — the theme of this paper. See Section 

We occasionally refer to elements of <S r as 'rank-r tensors'. This is slightly in- 
accurate, since lower-rank tensors are included, but convenient. However, the direct 
assertions l A has rank r' and 'rank(A) = r' are always meant in the precise sense. 
The same remarks apply to 'border rank', which is defined in Section 15.51 We refer 
to elements of S r as 'border-rank-r tensors', and describe them as being 'rank-r- 
approximable'. 

Theorem 15.11 asserts that £2(^1, cfe, cfo) C Ss(di, d2,d^) for all d\,d2,d^, and that 
the exceptional tensors S2(d\, g?2, Gfa) \ £2(^1, o?2, ^3) are all of a particular form. 

4.1. Upper semicontinuity. Discrete- valued rank functions on spaces of ma- 
trices or tensors cannot usefully be continuous, because they would then be constant 
and would not have any classifying power. As a sort of compromise, matrix rank is 
well known to be an upper semicontinuous function; if rank(A) = r then rank(£?) > r 
for all matrices B in a neighborhood of A. This is not true for the outer-product 
rank of tensors (as we will see Section I4.2[) . There are several equivalent ways of 
formulating this assertion. 

PROPOSITION 4.1. Let r > 2 and k > 3. Given the norm-topology on fl^ix - xd^ 
the following statements are equivalent: 

(a) The set S r (di, . . . , d k ) := {A e M d ^ x - Xd " \ rank^A) < r} is not closed. 

(b) There exists a sequence A n £ R d i x "- ><<J fc j rank 8 (A„) < r, n E N, converging to 
B e M dlX '-- xdfc with rank (B) > r. 

(c) There exists B £ R d i x " xd fc ; rank^(_B) > r, that may be approximated arbitrarily 
closely by tensors of strictly lower rank, i. e. 

inf{||B - A\\ I rank (A) < r} = 0. 

(d) There exists C £ R dlX "' xdfc ) rank,g(C) > r, that does not have a best rank-r 
approximation, i.e. 

mf{||C- A|| I rank 8 (A) < r} 
is not attained (by any A with rank® (A) < r). 

Proof. It is obvious that (a) => (b) (c) (d). To complete the chain, we 
just need to show that (d) => (a). Suppose S := S r {d\, . . . ,d k ) is closed. Since the 
closed ball of radius ||C|| centered at C, {A £ Rd 1 x-xd k | ||c_^|| < ||c||} ; intersects 
S non-trivially (e.g. is in both sets). Their intersection T = {A £ M d ~ LX '" xdk \ 
rank 8 (yl) <r, \\C — A\\ < \\C\\} is a non-empty compact set. Now observe that 

6 := inf{||C - A\\ \ A £ S} = inf{||C - A\\ \ A e T} 

since any A' £ S\T must have ||C - A'\\ > \\C\\ while we know that S < \\C\\. By 
the compactness of T, there exists A* £ T such that |C — A* \ = 5. So the required 
infimum is attained by A* £ T C S. □ 

We caution the reader that there exist tensors of rank > r that do not have a 
best rank-r approximation but cannot be approximated arbitrarily closely by rank- 
r tensors, i.e. inf{||C — A\\ \ rank 8 (A) < r} > 0. In other words, statement (d) 
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applies to a strictly larger class of tensors than statement (c) (cf. Section [5]) . The 
tensors in statement (d) are sometimes called 'degenerate' in the psychometrics and 
chemometrics literature (e.g. [321 EHJ EH GUI [55]) but we prefer to avoid this term 
since it is inconsistent (and often at odds) with common usage in Mathematics. For 
example, in Table mi the tensors in the orbit classes of D 2 , D' 2 , D 2 ' are all degenerate 
but statement (d) does not apply to them; on the other hand, the tensors in the orbit 
class of G3 are non-degenerate but Theorem 18. II tells us that they are all of the form 
in statement (d). 

We begin by getting three well-behaved cases out of the way. The proofs shed 
light on what can go wrong in all the other cases. 

PROPOSITION 4.2. For all d\, . . . , d k , we have S\(di, . . . ,d k ) = S\{d\, . . . , d k ). 
Proof. Suppose A n — > A where rank^A^) < 1. We can write 

An = A„U Ml ® U 2 ,„ ® • ■ • ® Ujfe, n 

where A n = \\A n \\ and the vectors u^„ £ R di have unit norm. Certainly \ n = \\A n \\ — > 
1 1 A 1 1 =: A. Moreover, since the unit sphere in M. di is compact, each sequence u^„ has 
a convergent subsequence, with limit u^, say. It follows that there is a subsequence 
of A n which converges to Aui (g> •■ • g) Ufe. This must equal A, and it has rank at 
most 1. □ 

Proposition 4.3. For all r andd\,d 2 , we have S r (di, <fe) = S r (di,d2)- In other 
words, matrix rank is upper-semicontinuous. 

Proof. Suppose A n — > A where rank(A„) < r, so we can write 

A n = Ai,„Ui ira (g) vi,„ H h A I%n u r! „ ® v r _„. 

Convergence of the sequence A n does not imply convergence of the individual terms 
\,m Ui,ri) v i,n, even in a subsequence. However, if we take the singular value decom- 
position, then the Uj iTl and Vi t „ are unit vectors and the Xi^ n satisfy 

A?,„ + --- + A^= ||A„|| 

Since ||j4„|| — > \\A\\ this implies that the Xi yTl are uniformly bounded. Thus we can 
find a subsequence with convergence A;. n — > Ai, u,- iTi — > u^, Vi.„ — > for all i. Then 

A = A1U1 ® vi H h A r u r ® v r 

which has rank at most r. □ 

PROPOSITION 4.4. The multilinear rank function rank ffl (A) = (?r(A), . . . ,r fc (A)) 
zs upper semicontinuous. 

Proof. Each is the rank of a matrix obtained by rearranging the entries of A, 
and is therefore upper semicontinuous in A by Proposition 14.31 □ 

Corollary 4.5. Every tensor has a best rank-1 approximation. Every matrix 
has a best rank-r approximation. Every order-k tensor has a best approximation with 
rank ffl < (ri, . . . , r k ), for any specified (n, . . . , r k ). 

Proof. These statements follow from Proposition 14.21 14.31 and 14.41 together with 
the implication (d)=>(a) from Proposition 14. II □ 

4.2. Tensor rank is not upper semicontinuous. Here is the simplest exam- 
ple of the failure of outer-product rank to be upper semicontinuous. This is the first 
example of a more general construction which we discuss in Section 14.71 A formula 
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similar to (|4.4|) appeared as Exercise 62 in Section 4.6.4 of Knuth's The Art of Com- 
puter Programming |48j (the original source is unknown to us but may well be |48j ) . 
Other examples have appeared in [7] (the earliest known to us) and |62j , as well as in 
unpublished work of Kruskal. 

PROPOSITION 4.6. Let xi,yi € R dl , x 2 ,y2 € M. d ' 2 and x 3 ,y 3 e M d3 be vectors 
such that each pair Xi,y; is linearly independent. Then the tensor 

^ := xi ® x 2 ®y 3 +xi ®y 2 (K)x 3 +yi ®x 2 ® x 3 e R^x^x* (4.4) 

has rank 3 but can be approximated arbitrarily closely by tensors of rank 2. In partic- 
ular, A does not have a best rank-2 approximation. 
Proof. For each n £ N, define 

A n ■= n ^xi + ^yi^ ® fx 2 + ~y^j ® ^ x 3 + ^y3 j - nxi x 2 <g> x 3 (4.5) 
Clearly, rank$(A„) < 2, and since, as n — > 00, 

~ < -||yi 8> y 2 ® x 3 + yi ® x 2 ® y 3 + xi <g> y 2 ® y 3 || F 

+ — o llyi ® y2 <E>y3||F -> o, 

we see that A is approximated arbitrary closely by tensors A,,,. 

It remains to establish that rank® (A) = 3. From the three-term format of A, we 
deduce only that rank® (A) < 3. A clean proof that rank® (A) > 2 is included in the 
proof of Theorem l7.11 but this depends on the properties of the polynomial A defined 
in Section [5.31 A more direct argument is given in the next lemma. □ 

Lemma 4.7. Let Xi, yi e W 1 , x 2 , y 2 e W* 2 , x 3 , y 3 € W* 3 and 

A = xi ® x 2 ® y 3 + xi ® y 2 ® x 3 + yi ® x 2 x 3 . 

XTien rank^ (A) = 3 i/ and on^y i/ Xj, yi are linearly independent for i = 1,2,3. 

Proof. Only two distinct vectors are involved in each factor of the tensor product, 
so rank ffl (A) < (2, 2, 2) and we can work in R 2x2x2 (Corollary |3~2]) . More strongly, if 
any of the pairs {x^, y^} is linearly dependent, then A is GL-equivalent to a tensor in 
R lx2x2 , R 2xlx2 or R 2x2xl . These spaces are isomorphic to R 2x2 , so the maximum 
possible rank of A is 2. 

Conversely, suppose each pair {xi,yi} is linearly independent. We may as well 
assume that 

(4.6) 

since we can transform A to that form using a multilinear transformation (Li, L 2 , L 3 ) 
where Lj(xj) = ei and Lj(yj) = e 2 for i = 1, 2, 3. 

Suppose, for a contradiction, that rank® (A) < 2; then we can write 

A = ui <g> u 2 <8> u 3 + vi ® v 2 ® v 3 (4.7) 



" 


1 


1 


" 


1 












for some , <E M. di 
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Claim 1: The vectors Ux, Vx are independent. If they are not, then let ip : M 2 — > R 
be a nonzero linear map such that v?( u i) = <p(vi) = 0. Using the expressions in (|4.7|l 
and (14. 6|) . we find that 



<^(e 2 ) <p(ex) 
y(ex) 



m 

zero. 



= ^,1,1)- A 

l 2x2 , which is a contradiction because </?( e i) an d f{ e 2) cannot both be 



Claim 2: The vectors Ux,ex are dependent. Indeed, let <p v 
map whose kernel is spanned by Ux . Then 



be a linear 



</S«(vx)(v 2 <g) V 3 ) = (lf u ,I,I) ■ A 



Vvfa) ^u(ei) 
Vu(ex) 



)lx2x2 



!>2x2 



The lhs has rank at most 1, which implies on the rhs that 



fud^i) — 0; an d hence ex G span{ux}. 

Claim 3: The vectors Vx,ex are dependeixt. Indeed, let ip v : R 2 
map whose kernel is spanned by vx . Then 



be a linear 



ipv(ui)(u 2 ® u 3 ) = (ip v ,I,I) ■ A 



<^„(ex) 



alx2x2 



p2x2 



The lhs has rank at most 1, which implies on the rhs that 
fvi^i) = 0, and hence ex € spanjvx}. 

Taken together, the three claims are inconsistent. This is the desired contradic- 
tion. Thus rank® (A) > 2 and therefore rank® (A) = 3. □ 

Remark. Note that if we take d\ = d 2 = e?3 = 2, then (|4.4[) is an example of a 
tensor whose outer product rank exceeds minjcJx, d 2 , d 3 }. 

4.3. Diverging coefficients. What goes wrong in the example of Proposi- 
tion 14.61 .'' Why do the rank-2 decompositions of the A n fail to converge to a rank-2 
decomposition of A? We can attempt to mimic the proofs of Propositions 14. 21 axxd 14. 31 
by seeking convergent subsequences for the rank-2 decompositions of the A n . We fail 
because we cannot simultaneously keep all the variables bounded. For example, in 
the decomposition 



A, 















fxx - 


--yi) « 

n J 


5 ( x 2 - 


- -Y2 J i 
n J 


3 fx 3 - 


- -y3 ) 

n J 



A, 



nxi (g) x 2 ® x 3 



n tend to infinity. In spite 



the vector terms converge but the coefficients Ax 
of this, the sequence A n itself remains bounded. 

In fact, rank-jumping always occurs like this (see also [IH])- 

PROPOSITION 4.8. Suppose A n — ► A, where rank® (A) > r+1 and rank® (A n ) < r 
for all n. If we write 



A n = Al „Ul „ (g> Vl „ 55 w x „ H h A 



where the vectors Uj iraj Vi )Tl , w !jn are unit vectors, then maxi{|Ai jrl |} — * oo as n — > oo. 
Moreover, at least two of the coefficient sequences {A^„ | n = 1, 2, . . . } are unbounded. 

Proof. If the sequence maxi{|Ai jn |} docs not diverge to oo, then it has a bouxxded 
subsequence. In this subsequence, the coefficients and vectors are all bouxxded, so 
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we can pass to a further subsequence in which each of the coefficient sequences and 
vector sequences is convergent: 

A;,n — ► Aj, U^„ > Ui, Vi >n ► v i, w i.n '' w i 

It follows that v4 = A1U1 (x) vi ® wi + • ■ • + A r u r (g) v r ® wy, so it has rank at most r, 
which is a contradiction. 

Thus maxi{|Ai jrl |} diverges to oo. It follows that at least one of the coefficient se- 
quences has a divergent subsequence. If there were only one such coefficient sequence, 
all the others being bounded, then (on the subsequence) A n would be dominated by 
this term and consequently ||A„|| would be unbounded. Since A n — > A, this cannot 
happen. Thus there are at least two unbounded coefficient sequences. □ 

For a minimal rank-jumping example, all the coefficients must diverge to oo. 

PROPOSITION 4.9. Suppose A n — ► A, where rank 8 ( A) = r+s and rank® (An) < r 
for all n. If we write 

A n = Ai, n Ui i7l ® Vi )fl (g> Wi : „ H h A r! „u r . n (g) v r ^„ <g> w nn! 

where the vectors u,- in , Vj. n , Wj in are unit vectors, then there are two possibilities: 
either (i) all of the sequences |Aj in | diverge to oo as n — > oo; or (ii) in the same 
tensor space there exists B n — > B, where Tank^(B) > r' + s and rank^(_B„) < r' for 
all n, for some r' < r. 

Proof. Suppose one of the coefficient sequences, say |Aj >n |, fails to diverge as 
n — > oo; so it has a bounded subsequence. In a further subsequence, the ith term 
Rn = \,nVLi,n ® ~Vi,n ®~Wi,n converges to a tensor R of rank (at most) 1. Writing B n = 
A„ — Rn, we find that B n — > B = A — R on this subsequence, with rankg,(_B„) < r — 1. 
Moreover, r + s < rank® (A) < rank^(B) + rank®(i?), so rank^(B) > (r — 1) + s. □ 

Remark. Clearly the arguments in Propositions 14.81 and 14.91 apply to tensors of 
all orders, not just order 3. We also note that the vectors (uj lTt etc.) need not be unit 
vectors; they just have to be uniformly bounded. 

One interpretation of Proposition 14.81 is that if one attempts to minimize 

\\A — AiUi (g> vi ® wi — A r u r (g) v r ® w r || 

for a tensor A which does not have a best rank-r approximation, then (at least some 
of) the coefficients Ai become unbounded. This phenomenon of diverging summands 
has been observed in practical applications of multilinear models in psychomctrics and 
chemometrics and is commonly referred to in those circles as 'candecomp/parafac 
degeneracy' or 'diverging candecomp/parafac components' [HI [HI [S3 ISHJ- 
More precisely, these are called 'fc-factor degeneracies' when there are k diverging 
summands whose sum stays bounded. 2- and 3-factor degeneracies were exhibited 
in [62] and 4- and 5-factor degeneracies were exhibited in [67j . There are uninter- 
esting (see Section 23} and interesting (see Section l4~7j) ways of generating fc-factor 
degeneracies for arbitrarily large fc. 

4.4. Higher orders, higher ranks, arbitrary norms. We will now show that 
the rank-jumping phenomenon — that is, the failure of S r (di, . . . , d^) to be closed — 
is independent of the choice of norms and can be extended to arbitrary order. The 
norm independence is a trivial consequence of a basic fact in functional analysis: all 
norms on finite dimensional vector spaces are equivalent; in particular, any norm will 
induce the same unique topology on a finite dimensional vector space. 

Theorem 4.10. For k > 3, and di, . . . , dk > 2, the problem of determining a best 
rank-r approximation for an order-k tensor in W* lX '" x k has no solution in general 
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for any r = 2, . . . ,min{di, . . . ,dk}. In particular, there exists A G R d i x ' xd fc with 

rank® (A) = r + 1 

that has no best rank-r approximation. The result is independent of the choice of 
norms. 

Proof. We begin by assuming k = 3. 

Higher rank. Let 2 < r < min{di, d2, g^}. By Lemma [3.51 we can construct 
a tensor B G R(<fe-2)x(d 2 -2)x(d 3 -2) witn ran k r _ 2 . By Proposition SH we can 
construct a convergent sequence of tensors C n — > C in R 2x2x2 with rankg>(C n ) < 2, 
and rank®(C) = 3. Let A n = B © C„ G R******. Thcn ^ A := B © C* and 
rankg,(A„) < rank®(i?) + rank 8 (C n ) < r. The result of JaJa-Takche fThcorem l3.4[) 
implies that rank® (A) = rank®(i?) + rank®(C) = r + 1. 

Arbitrary order. Let u 4 G M d4 , . . . , u fc G K dfc be unit vectors and set 

A„ := A„ ® U4 (8 • • • © u fc , A := A © u 4 © • • • © u^. 

By 63), 

||A„ - A\\ F = ||4 - A|| = \\B ®C n -B® C\\ = \\C n - C\\ -► 0, as n -> oo. 

Moreover, Corollary 13.31 ensures that rank® (A) = r + 1 and rank®(A n ) < r. 

Norm independence. Whether the sequence A„ converges to A is entirely de- 
pendent on the norm-induced topology on R d i x "-x d fc. s mce it nas a unique topology 
induced by any of its equivalent norms as a finite-dimensional vector space, the con- 
vergence is independent of the choice of norms. □ 

We note that the proof above exhibits an order-fc tensor, namely A, that has rank 
strictly larger than min{di, . . . , dk}- 

4.5. Tensor rank can leap an arbitrarily large gap. How can we construct 
a sequence of tensors of rank r that converge to a tensor of rank r + 2? An easy 
trick is to take the direct sum of two sequences of rank-2 tensors of the form shown 
in (|4.5p . The resulting sequence converges to a limiting tensor that is the direct sum 
of two rank-3 tensors, each of form shown in (|4.4|) . To show that the limiting tensor 
has rank 6 (and does not have some miraculous lower-rank decomposition), we once 
again turn to the theorem of JaJa-Takche, which contains just enough of the direct 
sum conjecture (|3.1[) for our purposes. 

Proposition 4.11. Given any s£N andr > 2s, there exists a sequence of order - 
3 tensors B n such that rank®(i?„) < r and linin^oo B n = B with rank®(i?) = r + s. 

Proof. Let d = r — 2s. By Lemma T3.51 there exists a rank-rf tensor C G ^dxdxd^ 
Let A n — > A be a convergent sequence in R 2x2x2 with rank® (A) < 2 and rank® (A) = 
3. Define 

B n = C © A n © • ■ ■ © A n , B ^ C © A © ■ ■ • © A 

where there arc s terms A n and A. Then B n — + B, and rank®(i? n ) < r — 2s + 2s = r. 
By applying the JaJa-Takche sequentially s times, once for each summand A, we 
deduce that rank® (I?) = r — 2s + 3s = r + s. □ 

As usual the construction can be extended to order-fc tensors, by taking an outer 
product with a suitable number of non-zero vectors in the new factors. 

Corollary 4.12. Given any s > 1, r > 2, and fc > 3, with r > 2s, there exists 
A G R d i x "' x «* such that rank® (A) = r + s and A has no best rank-r approximation. 

Proof. This follows from Proposition 14. Ill and the previous remark. □ 
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4.6. Bregman divergences and other continuous measures of proximity. 

In data analytic applications, one frequently encounters low-rank approximations with 
respect to 'distances' that are more general than norms. Such a 'distance' may not 
even be a metric, an example being the Bregman divergence [101 126] (sometimes also 
known as Bregman distance). The definition here is based on the definition given in 
[26] . Recall first that if S C R ra , the relative interior of S is simply the interior of S 
considered as a subset of its affine hull, and is denoted by ri(5). 

Definition 4.13. Let S C R<*ix-xd* be a convex set. Let ip : S — > R be a lower 
semicontinuous, convex function that is continuously differentiable and strictly convex 
in ri(5). Let tp have the property that for any sequence {C n } C ri(5) that converges 
to C G S \ ri(,S), we have: 

lim ||W(C„)|| = +oo. 

n — >oo 

The Bregman divergence D v : S x ri(S') — > R is defined by 

D V (A, B) = ip(A) - tp[B) - (V<p(B), A-B). 

It is natural to ask if the analogous problem APPROx(A, r) for Bregman divergence 
will always have a solution. Note that a Bregman divergence, unlike a metric, is not 
necessarily symmetric in its two arguments and thus there are two possible problems: 

argmin rank{9(B) < r Li ¥ ,(yl, J B) and argmin rankts ( B )< r D V {B, A). 

As the following proposition shows, the answer is no in both cases. 

Proposition 4.14. Let D v be a Bregman divergence. Let A and A n be defined 
as in (|4.4|) and (|4.5[) respectively. Then 

lim D V (A, A n ) = = lim D v (A n ,A). 



Proof. The Bregman divergence is jointly continuous in both arguments with 
respect to the norm topology, and A n —* A in norm, so D v> (A,A n ) — > D V (A,A) = 
and D v {A n , A) -> D V (A, A) = 0. □ 

Proposition 14.141 extends trivially to any other measure of nearness that is con- 
tinuous with respect to the norm topology in at least one argument. 

4.7. Difference quotients. We thank Landsberg [S3] for the insight that the 
expression in (|4.4p is best regarded as a derivative. Indeed, if 

f(t) = (x + iy)® 3 = (x + ty) ® (x + ty) <g> (x + ty) 



then 



'1 
dt 

by the Leibniz rule. On the other hand 



■ X OS) X i 



'1 

dt 



= lim 



(x + ty) ® (x + ty) (g> (x + ty) - x i 



and the difference quotient on the right-hand side has rank 2. The expression in (14.51) 
can be obtained from this by taking t = 1/N. 
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We can extend Landsberg's idea to more general partial differential operators. 



It will be helpful to use the degree- k Veronese map [37], which is 14 (x) = x 
x <g> • • ■ <g> x (fc-fold product). Then, for example, the 6-term symmetric tensor 

x®y®z + x®z®y + y®z®x + y<K>x<K>z + z<K>x<K>y + x®y<g)X 
can be written MS M partial derivative 

d 2 



ds dt 



(x + sy + tzf 



s=t=0 



which is a limit of a 4-term difference quotient: 



lim 

s,t— >o 



V^ 3 (x + sy + fcz) - V^ 3 (x + sy) - ^ 3 (x + tz) + VS(x) 
st 



This example lies naturally in R 3x3x3 , taking x, y,z to be linearly independent. An- 
other example, in Jj 2x2 x2x2^ j g ^ c 6-term symmetric order-4 tensor 



xg)x(g)y(8)y + x(8)y(8)x(8)y + x(g)y(g)y(2)x 
This can be written as the second-order derivative 

which is a limit of a 3-term difference quotient: 

T 4 (x + 2iy) - 2F 4 (x + ty) + V 4 (x) 



(x + ty) 6 
2! 



lim 

t-»o 



2! t 2 



We call these examples symmetric Leibniz tensors for the differential operators 
d 2 /dsdt and d 2 /dt 2 , of orders 3 and 4, respectively. More generally, given positive 
integers k and oi, . . . , aj with a\ + • • • + aj = a < k, the symmetric tensor 



L k (ai, . . . , aj) 



x ®(fc-o) 

Sym 



■ yr^-'-^y; 



can be written as a partial derivative 

9 Q Vfc(x + t iyi 



dh ai 



•*j-yj) 



t 1= ... =ti=0 



[ail 



which is a limit of a difference quotient with (ai + 1) • ■ • {aj + 1) terms. On the 
other hand, the number of terms in the limit Lk(ai, . . . ,aj) is given by a multinomial 
coefficient, and that is usually much bigger. 

This construction gives us a ready supply of candidates for rank-jumping. How- 
ever, we do not know — even for the two explicit 6-term examples above — whether 
the limiting tensors actually have the ranks suggested by their formulas. We can show 
that rank®(LA;(l)) = k, for all k and over any field, generalizing Lemma 14.71 Beyond 
that it is not clear to us what is likely to be true. The optimistic conjecture is: 



rankg,(L fe (ai, ...,a,j)) 



k\ 



(k — a)! aj 



(4.8) 



k 

— a, oi, . . . , u 3/ 

Comon et al. |18| show that the symmetric rank of Lfe(l) over the complex numbers 
is k, so that is another possible context which (|4.8[) may be true. 
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5. Characterizing the limit points of order-3 rank-2 tensors. If an ordcr- 
3 tensor can be expressed as a limit of a sequence of rank-2 tensors, but itself has 
rank greater than 2, then we show in this section that it takes a particular form. This 
kind of result may make it possible to overcome the ill-posedness of APPROx(A, r), by 
defining weak solutions. 

Theorem 5.1. Let d\,d2,da > 2. Let A„ e n^ixdaxda fr e a se q Uence f tensors 
with rankg,(A ra ) < 2 and 

lim An = A, 

n — >oo 

where the limit is taken in any norm topology. If the limiting tensor A has rank higher 
than 2, then rank^(A) must be exactly 3 and there exist pairs of linearly independent 
vectors xi,yx G M dl , x 2 ,y 2 G R d ' 2 , x 3 ,y 3 G R ds such that 

A = xi ® x 2 ® y 3 + xi (g> y 2 (8 x 3 + yi ® x 2 ® x 3 . (5.1) 

The proof of this theorem will occupy the next few subsections. 

5.1. Reduction. Our first step is to show that we can limit our attention to the 
particular tensor space R 2x2x2 . Here the orthogonal group action is important. Recall 
that the actions of 0<j 1 ,...,d f! (K) and GL £ j 1 ,....d fc (K) on ]J<m.x — *<** are continuous and 
carry decomposable tensors to decomposable tensors. It follows that the subspaces 
S r and S r are preserved. The next theorem provides a general mechanism for passing 
to a tensor subspace. 

Theorem 5.2. Let r.i = min(r, d{) for all i. The restricted maps 

Odi,...,d h W x S r{n, ■ ■ ■ ,r k ) -> S r {di, ...,d k ) 
Od 1 ,...,d fc (K) xS r (n,...,r k ) -»-5 r (di,...,d fe ) 

given by {{L\, . . . , A) i— ► (Li, . . . , Lfc) • A are &o£/i surjective. 

In other words, every rank-r tensor in M d ix--x^ i s equivalent by an orthogonal 
transformation to a rank-r tensor in the smaller space W 1 x "' xrk . Similarly every rank- 
r-approximablc tensor in M. dlX - xdk is equivalent to a rank-r-approximable tensor in 

Proof. If A G S r (d\ , . . . , dk ) is any rank-r tensor then we can write A = 5Z^ =1 x{ ® 

• • • ® Xj. for vectors x| G R di . For each i, the vectors xj, . . . ,x£ span a subspace 
Vj C M di of rank at most r 4 . Choose L 4 G O di (R) so that L t (R d *) D V t . Let 
B = (L^ 1 , L^ 1 ) ■ A. Then A = (L 1 ,...,L k ) ■ B and B G S r (di, . . . , d k ). This 
argument shows that the first of the maps is surjective. 

Now let A G S r (di, . . . , d k ) be any rank-r-approximable tensor. Let (A^)^- 1 
be any sequence of rank-r tensors converging to A. For each n, by the preceding 
result, we can find G S r (d u . . . , d k ) and {L[ n \ . . . , L ( k n) ) G O dl ,... A (R) with 

• ■ • , -kj."^) • B^ = Since Od 1 ,....d fc (K) is compact, there is a convergent 

subsequence (L^ j) , . . . , L [ ™ l] ) -> (L u . . . , L fe ). Let B = (L 1: . . . , Lk)" 1 ■ A. Then 

A = (L k . . . , L fc )-B; and B^) = (L^, . . . , L^)" 1 ^^) - (Li, . . . , L^-^ = B, 
so B G iS r (di, . . . , dfc). Thus the second map is also surjective. □ 

Corollary 5.3. If Theorem \5.1\ is true for the tensor space R 2x2x2 then it is 
true in general. 
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Proof. The general case is V 1 <g> V 2 <E> V 3 = R^ix^xd., Suppose A G S 2 (d 1 , d 2 , d 3 ) 
and rank® (A) > 3. By Theorem 15.21 there exists (Li, L 2 , L 3 ) G <j 2 <j 3 (IE.) and 
5 S S 2 (2,2,2) with (L 1 ,L 2 ,L 3 ) ■ B = A. Moreover rank (B) = rank®(U) > 3 in 
-flgixmxn anc j hence rank® (5) > 3 in l> 2x2x2 by Proposition 13.11 Since the theorem 
is assumed true for R 2x2x2 and B satisfies the hypotheses, it can be written in the 
specified form in terms of vectors xi , x 2 , X3 and yi , y 2 , Ys, ■ It follows that A takes the 
same form with respect to the vectors Lixi, L 2 x 2 , L3X3 and Liyi, L 2 y 2 , L 3 y 3 . □ 

5.2. Tensors of rank 1 and 2. We establish two simple facts, for later use. 
Proposition 5.4. If A e ^dix—xd h ^ as ran j i ^ ^ en we can wr ^ e j± = 

(Li, . . . ,L k ) ■ B, where (L lt ...,L k )e GL dl! ... jdfc (K) and B = ex <g> • • • ® e k . 
Proof. Write A = xi ® • • ■ g) x^ and choose the Li so that i,(ej) = Xj. □ 
PROPOSITION 5.5. Assume d 4 > 2 for all i. If A G M dlX '" xdfc /ias ronfc 2, then we 
can write A = {L u . . . ,L k ) ■ B, where (Li, . . . ,L k ) G GL dl) ... idk (R) and B G R 2x '" x2 
is of the form B = ei ® • • • (8 ex + fi <8> ■ • • §5 ffc- -ffere ei denotes the standard basis 
vector (1, 0) T ; eac/i fi is eoiiaZ either to ex or to e 2 = (0, 1) T ; and ai least two of the f,; 
are equal to e 2 . 

Proof. We can write A = xi (gi • • • (g> Xfe + yx ® ■ • • ® yfe. Since rank® (A) = 2 all 
of the Xi and y* must be nonzero. We claim that yj,Xj must be linearly independent 
for at least two different indices i. Otherwise, suppose y, = A^ for k — 1 different 
indices, say i = 1, . . . , k — 1. It would follow that 

A = xx <g> • • ■ 18 x fc _x ® (x fc + (Ax • • ■ A fe _x)y fe ) 

contradicting rank® (A) = 2. 

For each i choose Li : K 2 — > R di such that Z^ex = Xj, and such that Lie 2 = y,; if 
yi is linearly independent of x.f, otherwise Ine 2 may be arbitrary. It is easy to check 
that (L\, . . . , Lfc) -1 • A = ex <E> • • • ® &i + Afx <E) ■ ■ ■ ® f k where the fi are as specified in 
the theorem, and A is the product of the A; over those indices where y$ = A,Xi. This 
is almost in the correct form. To get rid of the A, replace Lie 2 = y$ with Li& 2 = Ay, 
at one of the indices i for which Xj,yj are linearly indepexxdcixt. This completes the 
construction. □ 

5.3. The discriminant polynomial A. The structure of tensors in R 2x2x2 is 
largely governed by a quartic polynomial A which we define and discuss here. This 
same polynomial was discovered by Cay ley in 1845 [T5]. More generally, A is the 
2x2x2 special case of an object called the hyperdeterminant revived in its modern 
form by Gelfand, Kapranov, and Zelevinsky [30l[3T|. We give an elementary treatment 
of the properties we need. 

As in our discussion in Section 12.11 we identify a tensor A G IR 2 <g> IR 2 <E> IR 2 with 
the array A G R 2x2x2 of its eight coefficients with respect to the standard basis 
{ej (g> ej ® e k : k = 1, 2}. Pictorially, we can represent it as a pair of side-by-side 
2x2 slabs: 

222 

A = ^2 ^2 a ^'fc e * ® e J ® e fe = 

i=X j = l fc=x 

The general strategy is to find ways of simplifying the representation of A by ap- 
plying transformations in GL 2i2 , 2 (IR) = GL 2 (M) x GL 2 (M) x GL 2 (K). This group is 
generated by the following operations: decomposable row operations applied to both 
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slabs simultaneously; decomposable column operations applied to both slabs simulta- 
neously; decomposable slab operations (for example, adding a multiple of one slab to 
the other). 

Slab operations on a tensor A = \A\ \ A2] generate new 2x2 slabs of the form 
S = X1A1 + X 2 A 2 . One can check that: 

det(S) = X\ det(^) + A 1 A 2 det(Al+A2) 2 dct( ^ 1 ' A2) + \\ det(A 2 ) (5.2) 
We define A to be the discriminant of this quadratic polynomial: 
"det(Ai + A 2 ) - dct(A 1 - A 2 



M[Ai\A 2 ]) 



Explicitly if A = la ljk j lyJ , k=h2 € R 2x2x2 , then 



4det(Ai)dct(v4 2 ) (5.3) 



A*\ — ( 2 2 _L 2 2 _L 2 2 _L 2 2 "l 

^K-™-) — l a lll a 222 ' a 112 a 221 ' a 121 a 212 "+~ a 122 a 21lJ 

— 2(amaii 2 a 22 ia 222 + a\\\a\ 2 \a 2 \ 2 a 222 + a\\\a\ 22 a 2 \\a 222 

+ ^112012102120221 + 0112012202210211 + Oi2lOi220212021l) 

+ 4(ainai2202120221 + 0112012102110222)- 

PROPOSITION 5.6. Let Aet 2x2x2 , let A' be obtained from A by permuting the 
three factors in the tensor product, and let {L\, L 2 , L3) € GL 2 ,2,2(I^)- Then A(A') = 
A(A) and A{{L U L 2 ,L 3 ) ■ A) = dct(Li) 2 det(i 2 ) 2 det(L 3 ) 2 A(Aj. 

Proof. To show that A is invariant under all permutations of the factors of IR 2x2x2 , 
it is enough to check invariance in the cases of two distinct transpositions. It is clear 
from equation (|5.3[) that A is invariant under the transposition of the second and 
third factors, since this amounts to replacing A\,A 2 with their transposes Aj,Aj. 
To show that A is invariant under transposition of the first and third factors, write 
A = [un, U12 I U21, U22], where the Ujj are column vectors. One can verify that 

A(A) = det[u n , u 22 ] 2 + det[u 2 i, u 12 ] 2 

- 2 det[un, U12] det[u 2 i, u 22 ] - 2 det[u n , u 2X ] det[u 12 , u 22 ] 

which has the desired symmetry. 

In view of the permutation invariance of A, it is enough to verify the second 
claim in the case (L\, L 2 , L3) = (I, L 2 , 1). Then (L\, L 2 , L3) ■ A = \L 2 A\ \ L 2 A 2 ] and 
an extra factor det(L 2 ) 2 appears in all terms of equation (|5.3[) . exactly as required. □ 

COROLLARY 5.7. The sign of A is invariant under the action of GL 2 ^ 2i2 (M) . 

COROLLARY 5.8. The value of A is invariant under the action of 2 . 2 , 2 (K). 

Using the properties of A, we can easily prove, in a slightly different way, a result 
due originally to Kruskal (unpublished work) and ten Berge [72] . 

Proposition 5.9. If A(A) > then rank® (A) < 2. 

Proposition 5.10. J/ rank® (A) < 2 then A(A) > 0. 

Proof of Provosition \5.9[ If the discriminant A (A) is positive then the homoge- 
neous quadratic equation (|5.2[) has two linearly independent root pairs (An, A12) and 
(A21, X 22 ). It follows that we can use slab operations to transform \A\ \ A 2 ] — > [Bi \ B 2 ], 
where Bi = XnAi + Xi 2 A 2 . By construction det(Bi) = so we can write Bi = fj (g) gi 
for some fi,gi G R 2 (possibly zero). It follows that [B\ \ B 2 ] = ei(g)fi(>5gi+e2®f 2 (>5g2; 
so rank 8 (A) = rank ([Bi | B 2 ]) < 2. □ 
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Proof of Proposition \5.1(A It is easy to check that A(A) = if rank® (j4) < 1, 
since we can write A = (L 1 , L 2 , £3) • (e x (g> ei ® ei) or else A = 0. 

It remains to be shown that A(A) is not negative when rank® (A) = 2. Propo- 
sition implies that A can be transformed by an element of GL 2 . 2,2(1^) (and a 
permutation of factors, if necessary) into one of the following tensors: 

" 


Since A(Ii) = 1 and A(J 2 ) = it follows that A(A) > 0. 

Kruskal and also ten Berge deserve complete credit for discovering the above 
result. In fact, the hyperdeterminant for 2 x 2 x 2 tensor A is known by the name 
Kruskal polynomial in the psychometrics community [72] . Our goal is not so much to 
provide alternative proofs for Propositions 15.91 and 15.101 but to include them so that 
our proof of Theorem 15.11 can be self-contained. We are now ready to give that proof, 
thereby characterizing all limit points of order-3 rank-2 tensors. 

Proof of Theorem I5.il Note that the theorem is stated for order-3 tensors 
of any size d\ x d 2 x dz- We begin with the case A £ R 2x2x2 . Suppose A e 
5 2 (2,2,2) \5 2 (2,2,2). Then we claim that A(A) = 0. Indeed, since A g S 2 , Propo- 
sition [5TU implies that A (^4) < 0. On the other hand, since A £ S 2 , it follows from 
Proposition 15 . 101 and the continuity of A that A(A) > 0. 

Since A(A) = 0, the homogeneous quadratic equation (|5.2p has a nontrivial root 
pair (Ai,A 2 ). It follows that A can be transformed by slab operations into the form 
[Ai I S] where S = XiAi + X 2 A 2 and i = 1 or 2. By construction det(S') = 0, but 
5^0 for otherwise rank®(A) = rank(A,) < 2. Hence rank(S') = 1 and by a further 
transformation we can reduce A to the form: 

1 " 


In fact we may assume p = (the operation 'subtract p times the second slab from 
the first slab' will achieve this), and moreover s 2 = A(B) = 0. Both q and r must be 
non-zero, otherwise rank^(y4) = rank®(B) < 2. If we rescale the bottom rows by 1/r 
and the right-hand columns by 1/q we are finally reduced to: 

= e 2 ® ei ® ei + ei (g) e 2 (X> ei + ei ® ei ® e 2 

By reversing all the row, column and slab operations we can obtain a transformation 
(Li, L 2 , L3) € GL 2}2 ^ 2 (M.) such that A = (L\, L 2 , L3) ■ B'. Then A can be written in 
the required form, with x.; = Ljei, y; = Lie 2 for i = 1,2,3. 

This completes the proof of Theorem 1 5. II in the case of the tensor space R 2x2x2 . 
By Corollary 15 . 31 this implies the theorem in general. □ 

5.4. Ill-posedness and ill-conditioning of the best rank-r approximation 
problem. Recall that a problem is called well-posed if a solution exists, is unique, 
and is stable (i.e. depends continuously on the input data). If one or more of these 
three criteria are not satisfied, the probleirjfl is called ill-posed. 



3 Normally, existence is taken for granted and an ill-posed problem often means one whose solution 
lacks either uniqueness or stability. In this paper, the ill-posedness is of a more serious kind — the 
existence of a solution is itself in question. 
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From Sections H] and [51 wc see that tensors will often fail to have a best rank- 
r approximation. In all applications that rely on APPROx(A, r) or a variant of it 
as the underlying mathematical model, we should fully expect the ill-posedncss of 
APPROx(A, r) to pose a serious difficulty. Even if it is known a priori that a tensor A 
has a best rank-r approximation, wc should remember that in applications, the data 
array A available at our disposal is almost always one that is corrupted by noise, i.e. 
A = A+E where E denotes the collective contributions of various errors — limitations 
in measurements, background noise, rounding off, etc. Clearly there is no guarantee 
that A will also have a best rank-r approximation. 

In many situations, one only needs a 'good' rank-r approximation rather than the 
best rank-r approximation. It is tempting to argue, then, that the non-existence of 
the best solution does not matter — it is enough to seek an 'approximate solution'. 
We discourage this point of view, for two main reasons. First, there is a serious 
conceptual difficulty: if there is no solution, then what is the 'approximate solution' an 
approximation of? Second, even if one disregards this, and ploughs ahead regardless 
to compute an 'approximate solution', we argue below that this task is ill-conditioned 
and the computation is unstable. 

For notational simplicity and since there is no loss of generality (cf. Theorem 14. 101 
and Corollary 14. 12)) , we will use the problem of finding a best rank-2 approximation 
to a rank-3 tensor to make our point. Let A £ M. dlXd2Xd3 be an instance where 

argmin x . y . eR<ii ||A- xi ® x 2 ® x 3 - yi 0y 2 ® y 3 ]| (5.4) 

does not have a solution (such examples abound, cf. Section [5]). If we disregard the 
fact that a solution docs not exist and plug the problem into a computer program^, 
we will still get some sort of 'approximate solution' because of the finite-precision 
error inherent in the computer. What really happens here [77j is that we are effec- 
tively solving a problem perturbed by some small e > 0; the 'approximate solution' 
x*(e), y*(er) £ M. di (i = 1,2,3) is really a solution to the perturbed problem: 

\\A - x ;(e) ® x*(e) ® x*(e) - y*(e) ® y*(e) ® y*(e)|| 

= e + inf^y.gfljdj \\A - xi ® x 2 ® x 3 - yi ® y 2 ® y 3 ||. (5.5) 

Since wc are attempting to find a solution of (|5.4[) that does not exist, in exact 
arithmetic the algorithm will never terminate, but in reality the computer is limited by 
its finite precision and so the algorithm terminates at an 'approximate solution', which 
may be viewed as a solution to a perturbed problem (|5.5| . This process of forcing 
a solution to an ill-posed problem is almost always guaranteed to be ill-conditioned 
because of the infamous rule of thumb in numerical analysis [22j [23l [24] : 

A well-posed problem near to an ill-posed one is ill-conditioned. 

The root of the ill-conditioning lies in the fact that we are solving the (well-posed but 
ill-conditioned) problem (|5.5p that is a slight perturbation of the ill-posed problem 
(15. 4[) . The ill-conditioning manifests itself as the phenomenon described in Proposi- 
tion [4]8j namely, 

||xj(e)®x5(e)<8>x5(e)|| -> oo and ||y? (e) ® y*(e) ® y 3 (e)|| -> oo 

4 While there is no known globally convergent algorithm for APPROx(A.r), we will ignore this 
difficulty for a moment and assume that the ubiquitous alternating least squares algorithm would 
yield the required solution. 
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as e — > 0. The ill-conditioning described here was originally observed in numerical 
experiments by psychometricians and chemomctricians, who named the phenomenon 
'diverging CANDECOMP / PARAFAC components' or 'candecomp/parafac degener- 
acy' ggn5HE2|l57lES|. 

To fix the ill-conditioning, we should first fix the ill-posedness, i.e. find a well- 
posed problem. This leads us to the subject of the next section. 

5.5. Weak solutions. In the study of partial differential equations [29], there 
often arise systems of PDEs that have no solutions in the traditional sense. A standard 
way around this is to define a so-called weak solution, which may not be a continuous 
function or even a function (which is a tad odd since one would expect a solution to 
a pde to be at least diffcrentiablc). Without going into the details, we will just say 
that weak solution turns out to be an extremely useful concept and is indispensable 
in modern studies of pdes. Under the proper context, a weak solution to an ill- 
posed pde may be viewed as the limit of strong or classical solutions to a sequence of 
well-posed pde that are slightly perturbed versions of the ill-posed one in question. 
Motivated by the PDE analogies, we will define weak solutions to APPROx(^4, r). 

We let S r {d u . . . ,d k ) := {A G R dlX -* d * | rankg,(A) < r} and let 5 r (di, . . .,d k ) 
denote its closure in the (unique) norm topology. 

Definition 5.11. An order-k tensor A G R dlX "' x,! ' has border rank r if 

A G S r (di, . . . ,dk) and A £ 5 r _i(di, . . . , <4). 

This is denoted by rank g (A) . Note that 

S r {d u . . . , d k ) = {A G a**-** | rank^(S) < r}. 

Remark. Clearly rank g, (A) < rank® (A) for any tensor A. Since Sq = So (trivially) 
and Si = Si (by Proposition 14. 2ft . it follows that rank g (A) = rank^(A) whenever 
rank® (A) < 2. Moreover, rank g (A) > 2 if rank® (A) > 2. 

Our definition differs slightly from the usual definition of border rank in the 
algebraic computational complexity literature [5j [6l [12l |48l [54] , which uses the Zariski 
topology (and is normally defined for tensors over C). 

Let A G R<*ix ->«** with d t > 2 and k > 3. Then the way to ensure that 
APPROx(j4, r), the optimal rank-r approximation problem 

argmin rankg)(B) < r ||A - B\\ (5.6) 

always has a meaningful solution for any A G ^d 1 x---xd k j s ^ Q ms t e ad consider the 
optimal border-rank-r approximation problem 

axgmm ranke(BKr p - B\\. (5.7) 

It is an obvious move to propose to fix the ill-posedness of APPROx(A, r) by taking 
the closure. However, without a characterization of the limit points such a proposal 
will at best be academic — it is not enough to simply say that weak solutions are limits 
of rank-2 tensors, without giving an explicit expression (or a number of expressions) 
for them that may be plugged into the objective function to be minimized. 

Theorem 15.11 solves this problem in the ordcr-3 rank-2 case — it gives a complete 
description of these limit points with an explicit formula and, in turn, a constructive 
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solution to the border-rank approximation problem. In case this is not obvious, we 
will spell out the implication of Theorem 15.11 

Corollary 5.12. Let di,d 2 ,rf 3 > 2. Let A e R d i xd ^d 3 wit h rank® (.4) = 3. A 
is the limit of a sequence A n G M. dlXd2Xd3 with rank® (.A n ) < 2 if and only if 

A = yi <g) x 2 ® X3 + xi ® y 2 ® x 3 + xi ® x 2 <g> y3 

/or some x,-,yi linearly independent vectors in Wi di , i = 1,2,3. 

This implies that every tensor in £2(^1, . . . , dk) can be written in one of two 
forms: 

yi <g> x 2 ® x 3 + X! ® y 2 ® x 3 + x x 18 x 2 ® y 3 (5.8) 

or 

xi ®x 2 ®x 3 + yi ®y 2 S3 y 3 - (5.9) 

These expressions may then be used to define the relevant objective function(s) in the 
minimization of (|5.7| . As in the case of PDE, every classical (strong) solution is also 
a weak solution to APPROx(A, r). 

PROPOSITION 5.13. If B is a solution to (|5.6[) then B is a solution to (|5.7p . 

Proof. If \\A - B\\ < \\A - B'\\ for all B' E S r , then \\A - B\\ < \\A - B'\\ for all 
B' G S r by continuity. □ 

6. Semialgebraic description of tensor rank. One may wonder whether the 
result in Propositions 15.91 and 15.101 extends to more general hyper determinants. We 
know from [301131] that a hyperdeterminant may be uniquely defined (up to a constant 
scaling) in R d i x --- xd fc whenever d\, . . .,dfc satisfy 

di - 1 < ^(dj - 1) for i = 1, . . . , k. (6.1) 

(Note that for matrices, (|6.1|) translates to di = d 2 , which may be viewed as one 
reason why the determinant is defined only for square matrices). Let Dct^....,^, : 
fljdi x ■ ■ ■ x d fc _^ r he the polynomial function defined by the hyperdeterminant, when- 
ever (|6.1|) is satisfied. Propositions 15.91 and 15.101 tell us that the rank of a tensor is 2 
on the set {A | Dct 2i2 . 2 (A) > 0} and 3 on the set {A | Det 2;2i2 (A) < 0}. One may 
start by asking whether the tensor rank in W* lX '" xdk is constant-valued on the sets 

{A I Det du ..., dk (A) < 0} and {A \ Bet du ... 4k {A) > 0}. 

The answer, as Sturmfels has kindly communicated to us |70j . is no with explicit 
counterexamples in cases 2x2x2x2 and 3x3x3. We will not reproduce Sturm- 
fels' examples here (one reason is that Det 2i2;2j2 already contains close to 3 million 
monomial terms |35j ) but instead refer our readers to his forthcoming paper. 

We will prove that although there is no single polynomial A that will separate 
jjjdi x ■ ■ ■ x d fc ^ Q re gj ons f constant rank as in the case of R 2x2x2 , there is always a 
finite number of polynomials Ai, . . . , A m that will achieve this. 

Before we state and prove the result, we will introduce a few notions and no- 
tations. We will write M.[X\, . . . ,X m ] for the ring of polynomials in m variables 
X\, . . . , X m with real coefficients. Subsequently, we will be considering polynomial 
functions on tensor spaces and will index our variables in a consistent way (for ex- 
ample, when discussing polynomial functions on R ixmx " j the polynomial ring in 
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question will be denoted M.[X nl ,X n2 , . ..X lmn }). Given A = la ijk j G R' xmx ™ and 
p(X ln ,X 112 , . . . Xi mn ) G R[X m , A 112 , . . . Xi mn ], p(A) will mean the obvious thing, 
namely, p(A) = p(a m , nm, . . . , a imn ) £ R. 

A polynomial map is a function i* 1 : R n — > R m , defined for each a = [a%, . . . , a„] T G 
R", by F(a) = [/i(a), . . . ,/ m (a)] T where /< G R[X l5 ...,X„] for all i = 1 m. 

A semialgebraic set in R n is a union of finitely many sets of the formal 

{a G M" | p(a) = 0, gi (a) > 0, . . . , ©(a) > 0} 

where £ G N and p, qi, . . . , qi G . . . , X n ]. Note that we do not exclude the 

possibility of p or any of the qi being constant (degree-0) polynomials. For example, 
if p is the zero polynomial, then the first relation = is trivially satisfied and the 
semialgebraic set will be an open set in R n . 

It is easy to see that the class of all semialgebraic sets in R" is closed under 
finite unions, finite intersections, and taking complement. Moreover, if S C R ra+1 is 
a semialgebraic set and tt : R ra+1 — > R™ is the projection onto the first n coordinates, 
then 7r(«S) is also a semialgebraic set — this seemingly innocuous statement is in fact 
the Tarski-Seidenberg theorem [6H [71], possibly the most celebrated result about 
semialgebraic sets. We will restate it in a (somewhat less common) form that better 
suits our purpose. 

Theorem 6.1 (Tarski-Seidenberg). If S C R™ is a semialgebraic set and F : 
K n — y R m is a polynomial map, then the image F(S) C R m is also a semialgebraic 
set. 

These and other results about semialgebraic sets may be found in [THJ Chapter 2] , 
which, in addition, is a very readable introduction to semialgebraic geometry. 

Theorem 6.2. The set Tl r {d u . . . ,d k ) := {A G R d ix-* d * | rank^(A) = r} is a 
semialgebraic set. 

Proof. Let ?/v : (R dl x R dl x • • • x R dfc ) r -> ^xd 2 x-xd k be defined by 

ip r (vLi, vi, . . . ,zi; . . . ;u r , v r , . ..,z r ) = ui (g) vi (g) ■ • • g) zi H h u r ® v r ® • • • ig) z r . 

It is clear that the image of is exactly S r (di, . . . , d^) = {A | rank (8 (A) < r}. It is 
also clear that ip r is a polynomial map. 

It follows from Theorem 16.11 that S r (di, . . . , d k ) is semialgebraic. This holds for 
arbitrary r. So 1Z r (d\, . . . , d&) = S r (di, . . . , dfe)\<S r _i(di, . . . , d k ) is also semialgebraic. 
□ 

Corollary 6.3. There exist A , . . . , A m G R[Ai...i, . . . , X dl ... dk ] from which the 
rank of a tensor A G R d i x "' xd >= ca 7j fe e determined purely from the signs (i.e. + or — 
orO) ofA (A),...,A m (A). 

In the next section, we will see examples of such polynomials for the tensor space 
R 2x2x2 . We will stop short of giving an explicit semialgebraic characterization of 
rank, but it should be clear to the reader how to get one. 

7. Orbits of real 2x2x2 tensors. In this section, we study the equivalence 
of tensors in R 2x2x2 under multilinear matrix multiplication. We will use the results 
and techniques of this section later on in Section [S] where we determine which tensors 
in R 2x2x2 have an optimal rank-2 approximation. 

Recall that A and B G R 2x2x2 are said to be (GL 2 ,2,2(R)-)equivalent iff there 
exists a transformation (L,M,N) G GL 2 , 2 .2(R) such that A = (L, M, N) ■ B. The 



5 Only one p is necessary, because multiple equality constraints pi (a) = 0, . . . , Pfe(a) = can 
always be amalgamated into a single equation p(a) = by setting p = p\ + ■ ■ ■ + p|. 
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question is whether there is a finite list of 'canonical tensors' so that every A E R 2x2x2 
is equivalent to one of them. For matrices, A E M. mXn , rank(A) = r if and only if 
there exists M G GL m (M), N E GL„(K) such that 

(M, N) ■ A = MAN T = ^ ° Q . 

So every matrix of rank r is equivalent to one that takes the canonical form [ ^ ° ] . 
Note that this is the same as saying that the matrix A can be transformed into 
[ o r o] usm S elementary row- and column-operations: adding a scalar multiple of a 
row/column to another, scaling a row/column by a non-zero scalar, interchanging two 
rows/columns — since every (L\, L2) € GL mi „(M) is a sequence of such operations. 

We will see that there is indeed a finite number of canonical forms for tensors 
in R 2x2x2 ; although the classification is somewhat more intricate than the case of 
matrices — two tensors in R 2x2x2 can have the same rank but be inequivalcnt (i.e. 
reduce to different canonical forms). 

In fancier language, what we are doing is classifying the orbits of the group ac- 
tion GL 2j 2,2(K) on R 2x2x2 . We are doing for R 2x2x2 what Gelfand, Kapranov, and 
Zelevinsky did for C 2x2x2 in the last sections of [301 EH Not surprisingly, the results 
that we obtained are similar but not identical — there are eight distinct orbits for 
the action of GL 2 . 2,2^) on IR 2x2x2 as opposed to seven distinct orbits for the action 
of GL2.2,2(C) on C 2x2x2 — a further reminder of the dependence of such results on 
the choice of field. 

Theorem 7.1. Every tensor in Jj 2x2x2 j s equivalent via a transformation in 
GL2,2.2(K) to precisely one of the canonical forms indicated in Table \77I\ with its 
invariants taking the values shown. 

Proof. Write A = L4 X \ A 2 ], A4 E M 2x2 , for \a m l E K 2x2x2 . If rank(Ai) = 0, 
then 



" 





X X 








X X 



Using matrix operations, A must then be equivalent to one of the following forms 
(depending on rank(^2)) 



" 








" 




' 





1 


" 




' 





1 


" 







































1 



which correspond to Dq, D\, D2 respectively (after reordering the slabs). 
If rank(^4i) = 1, then we may assume that 



A = 



' 1 





a b 








c d 



If d ^ then we may transform this to G2 as follows: 



" 1 





a b 




1 





x " 




' 1 








" 








c d 








d 











1 



If d = then: 



" 1 





a b 




' 1 





b ' 








c 








c 
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tensor 



sign(A) rankg 



rankra rank^ 



D = 
Di = 
D 2 = 



D' 2 = 



G 2 



D, = 



G, 



o 


o 


o 


o " 














1 


o 


o 


o 














1 


o 


o 


o 





1 








1 








1 














" 1 








' 








1 





1 








" 











1 


" 1 








1 " 








1 






" 1 













1 


1 






(0,0,0) 

(1,1,1) 

(1,2,2) 
(2,1,2) 
(2,2,1) 
(2,2,2) 
(2,2,2) 
(2,2,2) 



























Table 7.1 

GL-orbits o/R 2x2x2 . The letters D,G stand for 'degenerate' and 'generic' respectively. 



In this situation we can normalize &, c separately, reducing these matrices to one of 
the following four cases (according to whether b, c are zero) : 



" 1 








" 




' 1 








1 " 




' 1 








" 




' 1 








1 " 




































1 













1 






which are D\, D' 2 , D 2 , D3 respectively. 

Finally, if rank(Ai) = 2, then we may assume that 



A = [Ax I A 2 



By applying a transformation of the form (I,L,L 1 ), we can keep A\ fixed while 
conjugating A 2 into (real) Jordan canonical form. There are four cases. 

If A 2 has repeated real eigenvalues and is diagonalizable, then we get D 2 : 



' 1 





X X 





1 — 1 


X X 



" 1 





A 


" 




' 1 








" 





1 





A 







1 
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If j4 2 has repeated real eigenvalues and is not diagonalizable, then we have 



" 1 





A 


1 " 




' 1 








1 " 





1 





A 







1 









which is equivalent (after swapping columns and swapping slabs) to D%. 
If Ai has distinct real eigenvalues, then A reduces to G 2 : 



" 1 





A 


" 




' 1 










' 1 








" 





1 





V _ 







1 


fJL-X 











1 



If Ai has complex eigenvalues, then we can reduce A to G3: 



" 1 





a —b 




' 1 








-b ' 


" 1 








-1 " 





1 


b a 







1 


b 





" ' 


1 


1 






Thus, every 2x2x2 tensor can be transformed to one of the canonical forms 
listed in the statement of the theorem. Moreover, the invariants sign(A) and rankg 
are easily computed for the canonical forms, and suffice to distinguish them. It follows 
that the listed forms are pairwise inequivalent. 

We confirm the given values of rank®. It is clear that rank®(£>o) = and 
rank® (£>i ) = 1. By Proposition 15.41 any tensor of rank 1 must be equivalent to D\ . 
Thus D2, D' 2 , D'2 and G 2 are all of rank 2. By Proposition 15. 5( every tensor of rank 2 
must be equivalent to one of these. In particular, D3 and G3 must have rank at 
least 3. Evidently rank®(£>3) = 3 from its definition; and the same is true for G3 by 
virtue of the less obvious relation 

G3 = (ei + e 2 ) ® e 2 ® e 2 + (ei - e 2 ) <g> ei <X> ei + e 2 ® (ei + e 2 ) ® (ei - e 2 ). 

Finally, we confirm the tabulated values of rank g . By virtue of the remark after 
Definition 15.111 it is enough to verify that rank gjZ^) < 2 and that rank gfG.s) = 3. 
The first of these assertions follows from Proposition ^. 61 The set of tensors of type G3 
is an open set, which implies the second assertion. □ 

Remark. We note that D3 is equivalent to any of the tensors obtained from it by 
permutations of the three factors. Indeed, all of these tensors have rank^ = (2, 2, 2) 
and A = 0. Similar remarks apply to G 2 , G3. 

Remark. The classification of GL 2j2j2 (C)-orbits in C 2x2x2 differs only in the 
treatment of G3, since there is no longer any distinction between real and complex 
eigenvalues. 

We caution the reader that the finite classification in Theorem 17. II is. in general, 
not possible for tensors of arbitrary size and order simply because the dimension or 
'degrees of freedom' of M. dlX '" xdk exceeds that of GL^ as soon as d\ ■ ■ -dk > 

d\ + - ■ -+G? 2 , (which is almost always the case). Any attempt at an explicit classification 
must necessarily include continuous parameters. For the case of R 2x2x2 this argument 
is not in conflict with our finite classification, since 2 • 2 • 2 < 2 2 + 2 2 + 2 2 . 

7.1. Generic rank. We called the tensors in the orbit classes of G 2 and G3 
generic in the sense that the property of being in either one of these classes is an open 
condition. One should note that there is often no one single generic outer product rank 
for tensors over R [50], [73] . (For tensors over C such a generic rank always exists [18].) 
The 'generic outer product rank' for tensors over M should be regarded as set- valued: 

generic-rank ( g ) (IR tilX "' x ' ifc ) = {r 6 N | 5 r (di,...,4) has non-empty interior}. 



34 



V. DE SILVA AND L.-H. LIM 



So the generic outer product rank in R 2x2x2 is {2,3}. Another term, preferred by 
some and coined originally by ten Berge, is typical rank |73j . 

Given di, . . . dk, the determination of the generic outer product rank is an open 
problem in general and a nontrivial problem even in simple cases — see [131 114j for 
results over C and [72j [73] for results over R. Fortunately, the difficulty does not 
extend to multilinear rank — a single unique generic multilinear rank always exist 
and depends only on d\, ... dk (and not on the base field, cf. Proposition 1 7. 4|) . 

Proposition 7.2. Let A e R dlX '" xdfe . Ifmnk m {A) = (ri(A),...,r k (A)) ) then 



1, 



ri(A) = mini di, TT d- 

generically. 

Proof. Let m : m* 1 *"*** -> R* x n i? H<% be the forgetful map that 'flattens' or 
'unfolds' a tensor into a matrix in the zth mode. It is easy to see that 

n(A) =rank( Mi (^)) (7.1) 

where 'rank' here denotes matrix rank. The results then follow from the fact that the 
generic rank of a matrix in R^n^i d i j s min(di, Ylj& dj)- D 
For example, for order-3 tensors, 

generic-rank ffl (R' xmxn ) = (min(Z, mn), min(m, In), min(n, lm)). 

7.2. Semialgebraic description of orbit classes. For a general tensor A 6 
R 2x2x2 , its orbit class is readily determined by computing the invariants sign(A(A)) 
and rankfg(yl), and comparing with the canonical forms. The ranks Vi(A) which con- 
stitute rank^(A) can be evaluated algebraically as follows. If A ^ then each ri(A) 
is either 1 or 2. For example, note that r±(A) < 2 if and only if the vectors A,n, 
A,i2, A,2i, A m 22 are linearly dependent, which happens if and only if all the 2-by-2 
minors of the matrix 

llll 1112 0,\2l 0,122 
0-211 ^212 1221 0,222 

are zero. Explicitly, the following six equations must be satisfied: 

01110212 = 02110112, Olll0221 = 02110121, fllll«222 = 0211 fl 122j (7-2) 
OH20221 = O212O121, OH20222 = O212O122, Oi2l0222 = 022lOi22- 

Similarly, r2 (A) < 2 if and only if 

OlllOl22 = Oi2lOll2, Oma221 = 01210211, 01110222 = 01210212; (7-3) 
Oll20l22 = O122O2H, OH20222 = 01220212, 02110222 = O221O212; 

and rz{A) < 2 if and only if 

OlllOl22 = O112O12I, 01110212 = OH20211, OH10222 = O112O22I) (7-4) 
01210212 = 0122 0211, Oi2l0222 = Ol220221j 02110222 = 02120221- 

The equations (|7.2p - (|7.4p lead to twelve distinct polynomials (beginning with Ai = 
01110212 — 02110112) which, together with Ao := A, make up the collection Ao, . . . , A12 
of polynomials used in the semialgebraic description of the orbit structure of R 2x2x2 , 
as in Corollarv l6.3l Indeed, we note that in Table FTTI the information in the fourth and 
fifth columns (rank® (A) , rank g (A)) is determined by the information in the second 
and third columns (sign(A), rankfg(A)). 
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7.3. Generic rank on A = 0. The notion of generic rank also makes sense on 
subvarieties of IR 2x2x2 , for instance on the A = hypcrsurface. 

PROPOSITION 7.3. The tensors on the hypersurface T> 3 — {A e M 2x2x2 | A(A) = 
0} are all of the form 

xi ® x 2 ® y 3 + xi ® y 2 ® x 3 + yi ® x 2 g> x 3 

and i/iey ftawe ranA; 3 generically. 

Proof. From the canonical forms in Table FTTl we see that if A (A) = 0, then 

A = xi ® x 2 ® y 3 + xi <8 y 2 ® x 3 + yi <g) x 2 <g> x 3 

for some Xj,yj G R 2 , not necessarily linearly independent. It remains to be shown 
that rank® (A) = 3 generically. 

From Theorem [7j] and the subsequent discussion, if A(A) = then rank®(v4) < 2 
if and only if at least one of the equation sets (|7.2p . (|7.3p , (|7.4p is satisfied. Hence 
T>2 := {A | A(j4) = 0, rank® (A) < 2} is an algebraic subset of P 3 . 

On the other hand, X> 3 \ T> 2 is dense in 2? 3 with respect to the Euclidean, and 
hence the Zariski, topology. Indeed, each of the tensors Do, D\, D 2 , D' 2 , D 2 can be 
approximated by tensors of type D3; for instance 



" 1 





e " 




' 1 








" 





1 










1 









= D'2- as e — > 0. 



Multiplying by an arbitrary (L, M, N) £ GL 2 2 2 (R), it follows that any tensor in V 2 
can be approximated by tensors of type Z) 3 . 

It follows that the rank-3 tensors V 3 \ T> 2 in 2? 3 constitute a generic subset of 2? 3 , 
in the Zariski sense (and hence in all the other usual senses). □ 

Remark. In fact, it can be shown that X> 3 is an irreducible variety. If we accept 
that, then the fact that T>2 is a proper subvariety of 2? 3 immediately implies that 
the rank-3 tensors form a generic subset of £> 3 . The denseness argument becomes 
unnecessary. 

7.4. Base field dependence. It is interesting to observe that the GL 2i2j2 (R)- 
orbit classes of G 2 and G 3 merge into a single orbit class over C (under the action of 
GL 2 2 , 2 (C)). Explicitly, if we write Zj, = xj. + iyf. and z^, = X& — iy^, then 

xi <g> x 2 ® x 3 + xi ® y 2 ® y 3 - yi ® x 2 ® y 3 + yi ® y 2 <g) x 3 

= i(zi ® z 2 ® z 3 + zi (g) z 2 (8) z 3 ). (7.5) 

The lhs is in the GL 2)2j2 (IR)-orbit class of G 3 and has outer product rank 3 over M. 
The RHS is in the GL 2i2>2 (C)-orbit class of G 2 and has outer product rank 2 over C. 
To see why this is unexpected, recall that an m x n matrix with real entries has the 
same rank whether we regard it as an element of R mxrl or of C mxn . Note however 
that G 2 and G 3 have the same multilinear rank — this is not coincidental but is a 
manifestation of the following result. 

PROPOSITION 7.4. The multilinear rank of a tensor is independent of the choice 
of base field. If IK is an extension field of k, the value rank^ (A) is the same whether 
A is regarded as an element of k dlX "' xdk or of K dlX "' xdk . 

Proof. This follows immediately from (|7.1[) and the base field independence of 
matrix rank. □ 
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In 1908, Bergman [J] considered linear subspaces of matrix spaces, and showed 
that the minimum rank on a subspace can become strictly smaller upon taking a field 
extension. He gave a class of examples, the simplest instance being the 2-dimcnsional 
subspace 



1 
1 



+ t 



1 

-1 



of R 2x2 . Every (nonzero) matrix in this subspace has rank 2, but the complexified 
subspace contains a matrix of rank 1. Intriguingly, this example is precisely the 
subspace spanned by the slabs of G3. We suspect a deeper connection. 

7.5. Injectivity of orbits. The tensor rank has the property of being invariant 
under the general multilinear group (cf. (|2.15p ). Indeed, much of its relevance comes 
from this fact. Moreover, from Proposition 13 . 1 1 we know that tensor rank is preserved 
when a tensor space is included in a larger tensor space. Similar assertions are true 
for the multilinear rank (cf. (|2 . 1 9(1 ) . 

The situation is more complicated for the function A defined on R 2x2x2 . The sign 
of A is GL2 i 2,2(R)-hivariant, and A itself is invariant under 02,2.2 (R)- For general 
di, c?2, d,3 > 2, we do not have an obvious candidate function A defined on R d i xd 2xd 3 _ 
However, there is a natural definition of A restricted to the subset of tensors A for 
which rankg(yl) < (2, 2, 2). Such a tensor can be expressed as 

A = (L, M, N) -(B®0) 

where B G R 2x2x2 , <E R(<*i-2)x(d 2 -2)x(d 3 -2) and (L,M,N) G O dl . d2id3 {R). We 
provisionally define A(A) = A(B), subject to a check that this is independent of the 
choices involved. Given an alternative expression A = [JJ , M', N') ■ (B' ©0), it follows 
that B © and B' © are in the same Od 1 ,d 2 ,d 3 (K)-orbit. Indeed: 

B © = (L^L', M^M', N^N') ■ (B 1 © 0). 

If we can show, more strongly, that B,B' belong to the same 02,2,2 (K)-orbit, then 
the desired equality A(B) = A(B') follows from the orthogonal invariance of A. 

The missing step is supplied by the next theorem, which we state in a basis-free 
form for abstract vector spaces. If V is a vector space, we write GL(V) for the group 
of invertible linear maps from V — > V. If, in addition, V is an inner-product space, 
we write O(V) for the group of norm-preserving linear maps V — > V. In particular, 
GL(M d ) ^ GL d (M) and 0(K d ) ^ O d (R). 

Theorem 7.5 (injectivity of orbits). Let k = K or C and Vi,...,Vk be k- 
vector spaces. Let U\ < Vi, . . . , Uk < Vfe. (1) Suppose B, B' G XJ\ ® ■ ■ ■ © Uk are in 
distinct GL([/i) x • • ■ x GL(C/fc)- orbits of U\® ■ ■ ■ ®Uk, then B and B' are in distinct 
GL(Vi) x • • • x GL(Vfc)-or6z£s of Vi © • • • © V/e. (2) Suppose B, B 1 G U± © • • • © Uk are 
in distinct 0(U±) x • • • x 0(Uk)-orbits ofU\ © • • • © Uk, then B and B' are in distinct 
0{V X ) x • • • x 0{V k )-orbits ofV x ®---®V k . 

Lemma 7.0. Let W < U < V be vector spaces and L G GL(V"). Suppose 
L(W) < U. Then there exists L G GL(J7) such that L\w = L\w- Moreover, if 
L G 0(V) then we can take L G 0(U). 

Proof. Extend L\w to U by mapping the orthogonal complement of W in U by 
a norm-preserving map to the orthogonal complement of L(W) in U . The resulting 
linear map L has the desired properties and is orthogonal if L is orthogonal. □ 



OPTIMAL LOW-RANK TENSOR APPROXIMATION 



37 



Proof of Theorem \7.5\ We prove the contrapositive form of the theorem. Suppose 
B' = (Li, . . . , Lf.) ■ B, where Lj G GL(Vi). Let Wj < Uj be minimal subspaces 
such that B is in the image of Wi <S> • ■ ■ €5 Wk '— > f/i ® ■ • ■ <8> Uk- It follows that 
Li(Wi) < Ui, for otherwise we could replace Wi by L^ 1 (i i (Wi) PI Ui). We can now 
use Lemma [7.61 to find Z, G GL(t/i) which agree with Li on Wj. By construction. 
{Li, ■ ■ ■ , Lk) ■ B = (Li, . . . , Lk) ■ B = B' . In the orthogonal case, where Li G 0{Vi), 
we may choose Li G 0(Ui). □ 

Corollary 7.7. Let tp be a GL < j li ... )( j i .(R)-inuananf (respectively <j fc (R)- 
invariant) function on R<»i x " , x d * i 27ien 93 naturally extends to a GL ( j 1 ,...,d fe (R)- 
invariant (respectively Od lt ...,d. k {M.)-invariant) function on the subset 

{ AeR (.d 1 +e 1 )x-x(d k +e k ) | < d, /or * = 1, . . . , *} 

J flj(di +ei ) x • ■ ■ x (d fc +e fc ) 

Proof. As with A above, write A = (L i; . . . , L fc ) ■ B for B £ M. dlX '" xdk and define 
(/5(A) = <p(B). By Theorem 17.51 this is independent of the choices involved. □ 

The problem of classification is closely related to finding invariant functions. We 
end this section with a strengthening of Theorem 17.11 

Corollary 7.8. The eight orbits in Theorem \7.1\ remain distinct under the em- 
bedding R 2x2x2 w Rrfixd2xd 3 j or any d 1 ,d 2 ,d 3 >2. Thus, Theorem\7J\ immediately 
gives a classification of tensors A G W ilXd2Xd3 with ranka(A) < (2,2,2), into eight 
classes under GLd 1 ,d 3l( j 3 (K)-egMiwaience. 

The corollary allows us to extend the notion of tensor- type to M dl x d ' 2 x d3 . For 
instance, we will say that A G M. dlXd2Xd3 has type G3 iff A is GL-equivalent to 
G3 G R 2x2x2 c R d i xd 2xd 3 ^ 

Note that oider-k tensors can be embedded in order-(fc+ 1) tensors by taking the 
tensor product with a 1-dimensional factor. Distinct orbits remain distinct, so the 
results of this subsection extend to inclusions into tensor spaces of higher order. 

8. Volume of tensors with no optimal low-rank approximation. At this 
point, it is clear that there exist tensors that can fail to have optimal low-rank approx- 
imations. However it is our experience that practitioners have sometimes expressed 
optimism that such failures might be rare abnormalities that are not encountered in 
practice. In truth, such optimism is misplaced: the set of tensors with no optimal low- 
rank approximation has positive volume. In other words, a randomly chosen tensor 
will have a non-zero chance of failing to have a optimal low-rank approximation. 

We begin this section with a particularly striking instance of this. 

Theorem 8.1. No tensor of rank 3 in R 2x2x2 has an optimal rank-2 approx- 
imation (with respect to the Frobenius norm). In particular, APPROx(A, 2) has no 
solution for tensors of type G3, which comprise a set that is open and therefore of 
positive volume. 

Lemma 8.2. Let A G R d i x --- Xd fc with rank 8l (A) > r. Suppose B G S r (di, ...,d k ) 
is an optimal rank-r approximation for A. Then rank^(i?) = r. 

Proof. Suppose rank 8 (B) < r — 1. Then B ^ A, and so B — A has at least 
one nonzero entry in its array representation. Let E G M~ lX '" x k be the rank-1 
tensor which agrees with B — A at that entry and is zero everywhere else. Then 
rank®(B + E) < r but \\A - (B + E)\\ F < \\A - B\\ F , so B is not optimal. □ 

Proof of Theorem \8.1[ Let A G R 2x2x2 have rank 3, and suppose that B is an 
optimal rank-2 approximation to A. Propositions 15.91 and 15.101 together with the 
continuity of A, imply that A(J3) = 0. Lemma [8.21 implies that rank^i?) = 2. By 
Theorem O it follows that B is of type D 2 , D' 2 or D%. 
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We may assume without loss of generality that B is of type D 2 ■ The next step is to 
put B into a helpful form by making an orthogonal change of coordinates. This gives 
an equivalent approximation problem, thanks to the O-invariancc of the Frobcnius 
norm. From Table [TTTT we know that rank^S) = (1,2,2). Such a B is orthogonally 
equivalent to a tensor of the following form: 



(8.1) 



Indeed, a rotation in the first tensor factor brings B entirely into the first slab, and 
further rotations in the second and third factors put the resulting matrix into diagonal 
form, with singular values A, fi =/= 0. 

Henceforth we assume that B is equal to the tensor in (|8.ip . We will consider 
perturbations of the form B + eH, which will be chosen so that A(J5 + eH) = for 
all e € K. Then B + eH e 5 2 (2, 2, 2), and we must have 



" A 








" 





M 









\A-B\\ F < \\A-(B + eH)\\ F 



for all e. In fact 



\\A -(B + eH)\\% -\\A- Bf F = -2e(A - B, H) F + e 2 \\H\\ 2 F 
so if this is to be nonnegative for all small values of e, it is necessary that 

(A-B,H) F = 0. 



(8.2) 



Tensors H which satisfy the condition A(B + eH) = include the following: 



X X 





" 




' 








1 " 




' 








" 




' 





A 


" 


X X 






























1 



















since the resulting tensors have types Z?2, -D3, -D3, and D2 respectively. 

Each of these gives a constraint on A — B, by virtue of (|8.2p . Putting the con- 
straints together, we find that 



A-B = 



' 





a/i 








-aA 



A = 



' X 





a/i 





/' 


-aA 



for some a 6 K. Thus A = (Aei + a/ie2) <8>ei ® ei + (/iei — aAe2) <8>e2 <g) e 2 has rank 2, 
a contradiction. □ 

Corollary 8.3. Let di, d 2 , d 3 > 2. If A e R d i**xd 3 is f type G 3 , then A does 
not have an optimal rank-2 approximation. 

Proof. We use the projection 11^4 defined in subsection 12.61 For any B E 
M dlXd2Xd3 , Pythagoras' theorem (|2~20)) gives: 



\B 



\\Il A (B-A)\\% + \\(l-Il A )(B-A)\\% 
\\U A (B)-A\\% + \\B-U A {B)\\% 



If B is an optimal rank-2 approximation, then it follows that B = 11,4 (B); for otherwise 
IIa(-B) would be a better approximation. Thus B G U\®U2®Uz ) where U\, U2, U3 are 
the supporting subspaces of A. These are 2-dimcnsional, since rank^(A) — (2,2,2), 
so Ui (g) U2 ® U 3 = M 2x2x2 . The optimality of B now contradicts Theorem 18. II □ 
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Our final result is that the set of tensors A for which APPROx(A, 2) has no solution 
is a set of positive volume, for all tensor spaces of order 3 except those isomorphic to 
a matrix space; in other words, Theorem 11.31 Note that the G3-tensors comprise a 
set of zero volume in all cases except R 2x2x2 . Here is the precise statement. 

Theorem 8.4. Let d 1 ,d 2 ,d a > 2. The set of tensors A G R^x^xd., j or which 
APPROx(A, 2) does not have a solution (in the Frobenius norm) contains an open 
neighborhood of the set of tensors of type G 3 . In particular, this set is nonempty and 
has positive volume. 

For A G j^ix^x^ let denote the set of optimal border-rank-2 approxi- 

mations for A. Since S 2 (di,d 2 ,d 3 ) is nonempty and closed, it follows that B(A) is 
nonempty and compact. 

We can restate the theorem as follows. Let Aq be an arbitrary G3-tensor. We 
must show that if A is close to Aq, and B G B(A), then rank®(£?) > 2, i.e. B is a 
£)3-tensor. Our proof strategy is contained in the steps of the following lemma. 

Lemma 8.5. Let A G ^^y^xds ^ e a j^ xe( ^ f ensor j ty pe q 3 Then there exist 
positive numbers p = p(Aq), 6 = S(Aq) such that the following statements are true for 

(1) If A is a G 3 -tensor and B G B(A), then B is a D 3 -tensor and Ii B = II4. 

(2) // ||^4 - Aq\\ f < p and rank ffl (A) < (2,2,2), then A is a G 3 -tensor. 

(3) // \\A — A \\p < S and B G B(A), define A' = TIb(A). Then \\A' - A \\ F < p 
and Be B(A'). 

Proof of Theorem \8.4\ assuming Lemma\8J% Fix A G R d ^ d ^d 3 and 

suppose 

\\A — Aq\\f < S. It is not generally true that ranka(A) < (2,2,2), so we cannot 
apply (2) directly to A. Let B G B(A). Then A' = U B (A) is close to A , by (3). 
Since rankg(i?) < (2, 2, 2) and He is the projection onto the subspace spanned by B, 
it follows that rankg(A') < (2,2,2). Now (2) implies that A' is a G3-tcnsor. Since 
B G B(A'), by (3), it follows from (1) that B is a £> 3 -tcnsor. □ 

Proof of Lemma \8.5[ (1). This is essentially Corollary 18. 31 B cannot have rank 2 
or less, but it has border-rank 2, so B must be a Z^-tensor. Since B = Ha(B) it 
follows that the supporting subspaces of B are contained in the supporting subspaces 
of A. However, rank^(i?) = (2,2,2) = rankH(A), so the two tensors must have the 
same supporting subspaces, and so IIb = n,4- D 

Proof of Lemma \8. 5\ (2). Let S^{d\, d 2 , 0^3) denote the set of non-G3 tensors in 
R d lX rf 2 xd 3 wit h ran k ffl < (2, 2, 2). Since A S 2 (di, d 2 , d 3 ), it is enough to show that 
S2 (di, d 2 , d 3 ) is closed, for then it would be disjoint from the p-ball about Aq, for 
some p > 0. Note that 

sUdi,d 2 ,d 3 ) = O dlM (R) .5^(2,2,2). 

Now 5^(2,2,2) = {ie K 2x2x2 J A ^ > 0} is a closed subset of R 2x2x2 , and the 
action of the compact group Od 1} d2,d 3 (R) is proper. It follows that S^{di,d 2 ,d 3 ) is 
closed, as required. □ 

Proof of Lemma [8~5\ (3). We begin with the easier part of the statement, which 
is that B G B(A'). To prove this, we will show that \\A' -B\\ F < \\A'-B'\\ F whenever 
B' G B(A'), establishing the optimality of B as an approximation to A' . Accordingly, 
let B' G B(A'). Since T1 B (A') = A', it follows from (j^20|) with T1 B that 

\\A! - B'\\l = \\A'- n B (B')\\ 2 F + \\B' - IL B (B')\\% 
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so, since B' is optimal, wc must have Hb(B') = B' . We can now apply (|2.20p with 
U B to both sides of the inequality \\A — B\\ F < \\A — B'\\ 2 F to get 

\\A' -B\\ 2 F + \\A - A'f F < \\A' - B'\\ F + \\A - A'\\ F 

and hence \\A' - B\\ F < \\A' - B'\\ F , as claimed. 

We now turn to the proof that Us (A) is close to Aq if A is close to A$. This is 
required to be uniform in A and B. In other words, there exists 6 = S(Aq) > such 
that for all A and all B £ B(A), if \\A - A \\ F < 5 then \\Tl B (A) - A Q \\ < p. Here 
p = p{A ) is fixed from part (2) of this lemma. 

We need control over the location of B. Let B e (A ) denote the e- neighbor hood 
of B(A ) in S 2 (di,d 2l d 3 ). 

PROPOSITION 8.6. Given e > 0, there exists S > such that if \\A - A \\ F < 5 
then B(A) C B e (A ). 

Proof. The set S 2 (di, d 2 , d 3 ) \ B e (Ao) is closed, and so it attains its minimum 
distance from Aq. This must exceed the absolute minimum ||^4o — -BoIIf for Bq G B(Aq) 
by a positive quantity 23, say. If \\A — A \\ F < 8 and B' e £2(^1,^2,^3) \ B t (A ) then 

\\A-B'\\ F >\\B'-A \\ F -\\A-A \\ F 

> \\A -B \\ F + 25-6 
= \\Ao-B \\ F + S 

> \\Ao-B \\ F +\\A-A \\ F 

> \\A-Bo\\ F 

using the triangle inequality in the first and last line. Thus B' B(A). D 

We claim that if e is small enough, then rank^(i3) = (2, 2, 2) for all B £ B e (Ao). 
Indeed, this is already true on B(Aq), by part (1). Since rankg is upper-semicontinuous 
and does not exceed (2, 2, 2) on S 2 {di, d 2 , d 3 ), it must be constant on a neighborhood 
of B(Aq) in S 2 (di, d 2 ,d 3 ). Since B(Ao) is compact, the neighborhood can be taken to 
be an e-ncighborhood. 

Part (1) implies that II b = Ha f° r a U ^0 £ B(Aq). If e is small enough that 
r&nk&(B) = (2,2,2) on B e (Ao), then Hb depends continuously on B £ B e (Ao), by 
Proposition 12.51 Since B(Aq) is compact, we can choose e small enough so that the 
operator norm of Hb — ^A is as small as we like, uniformly over B t (Ao). 

We are now ready to confine Ub (A) to the p-neighborhood of Ao. Suppose, 
initially that \\A - A \\ F < p/2 and B £ B e (A ). Then 

\\n B (A) - a \\ f < ||(n B - n Ao ) • a\\ f + ||n Ao • a - a \\ f 

< \\U b -U Ao \\\\Mf + \\IIa -(A-Ao)\\f 

< \\n B - n Ao ||(||Ao|| F + p/2) + \\A -A \\ F 

< \\Il B -U Ao \\(\\A \\ F + p/2) + p/2 

Now choose e > so that the operator norm ||IIb — is kept small enough to 

guarantee that the right-hand side is less than p. For this e, choose 6 as given by 
Proposition 18. 61 Ensure also that S < p/2. 

Then, if \\A — A \\ F < 6 and B £ B(A), we have B £ B e {A ). By the preceding 
calculation, \\A' — ^4o||f < P- This completes the proof. □ 
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9. Closing remarks. We refer interested readers to [17, 18, 23(55] for a discus- 
sion of similar issues for symmetric tensors and nonnegative tensors. In particular, 
the reader will find in |18j an example of a symmetric tensor of symmetric rank r 
(may be chosen to be arbitrarily large) that does not have a best symmetric-rank- 2 
approximation. In |57[ 158] , we show that such failures do not occur in the context of 
nonnegative tensors — a nonnegative tensor of nonnegative-rank r will always have a 
best nonncgative-rank-s approximation for any s < r. 

In this paper we have focused our attention on the real case; the complex case 
has been studied in great detail in algebraic computational complexity theory and 
algebraic geometry. For the interested reader, we note that the rank-jumping phe- 
nomenon still occurs: Proposition 14.61 and its proof carry straight through to the 
complex case. On the other hand, there is no distinction between G3 and G2 tensors 
over the complex numbers; if A(A) ^ then A has rank 2. The results of Section [SJ 
have no direct analogue. 

The major open question in tensor approximation is how to overcome the ill— 
posedness of APPROx(A, r). In general this will conceivably require an equivalent of 
Theorem 15.11 that characterizes the limit points of rank-r order-fc tensors. It is our 
hope that some of the tools developed in our study, such as Theorems 15.21 and 17.51 
(both of which apply to general r and k), may be used in future studies. The type 
of characterization in Corollarv l5.12[ for r = 2 and k = 3, is an example of what one 
might hope to achieve. 
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